Skip to content

Commit

Permalink
Merge branch 'rds-support-FRMR-and-cleanups'
Browse files Browse the repository at this point in the history
Santosh Shilimkar says:

====================
RDS: Major clean-up with couple of new features for 4.6

v3:
Re-generated the same series by omitting "-D" option from git format-patch
command. Since first patch has file removals, git apply/am can't deal
with it when formated with '-D' option.

v2:
Dropped module parameter from [PATCH 11/13] as suggested by David Miller

Series is generated against net-next but also applies against Linus's tip
cleanly. Entire patchset is available at below git tree:

 git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.6/net-next/rds_v2

The diff-stat looks bit scary since almost ~4K lines of code is
getting removed. Brief summary of the series:

- Drop the stale iWARP support:
	RDS iWarp support code has become stale and non testable for
	sometime.  As discussed and agreed earlier on list, am dropping
	its support for good. If new iWarp user(s) shows up in future,
	the plan is to adapt existing IB RDMA with special sink case.
- RDS gets SO_TIMESTAMP support
- Long due RDS maintainer entry gets updated
- Some RDS IB code refactoring towards new FastReg Memory registration (FRMR)
- Lastly the initial support for FRMR

RDS IB RDMA performance with FRMR is not yet as good as FMR and I do have
some patches in progress to address that. But they are not ready for 4.6
so I left them out of this series.

Also am keeping eye on new CQ API adaptations like other ULPs doing and
will try to adapt RDS for the same most likely in 4.7+ timeframe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Mar 2, 2016
2 parents afc3de9 + 1659185 commit fcb3f55
Show file tree
Hide file tree
Showing 27 changed files with 1,065 additions and 5,035 deletions.
4 changes: 1 addition & 3 deletions Documentation/networking/rds.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,7 @@ to N*N if you use a connection-oriented socket transport like TCP.

RDS is not Infiniband-specific; it was designed to support different
transports. The current implementation used to support RDS over TCP as well
as IB. Work is in progress to support RDS over iWARP, and using DCE to
guarantee no dropped packets on Ethernet, it may be possible to use RDS over
UDP in the future.
as IB.

The high-level semantics of RDS from the application's point of view are

Expand Down
6 changes: 5 additions & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -9076,10 +9076,14 @@ S: Maintained
F: drivers/net/ethernet/rdc/r6040.c

RDS - RELIABLE DATAGRAM SOCKETS
M: Chien Yen <chien.yen@oracle.com>
M: Santosh Shilimkar <santosh.shilimkar@oracle.com>
L: netdev@vger.kernel.org
L: linux-rdma@vger.kernel.org
L: rds-devel@oss.oracle.com (moderated for non-subscribers)
W: https://oss.oracle.com/projects/rds/
S: Supported
F: net/rds/
F: Documentation/networking/rds.txt

READ-COPY UPDATE (RCU)
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Expand Down
7 changes: 3 additions & 4 deletions net/rds/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,13 @@ config RDS
depends on INET
---help---
The RDS (Reliable Datagram Sockets) protocol provides reliable,
sequenced delivery of datagrams over Infiniband, iWARP,
or TCP.
sequenced delivery of datagrams over Infiniband or TCP.

config RDS_RDMA
tristate "RDS over Infiniband and iWARP"
tristate "RDS over Infiniband"
depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS
---help---
Allow RDS to use Infiniband and iWARP as a transport.
Allow RDS to use Infiniband as a transport.
This transport supports RDMA operations.

config RDS_TCP
Expand Down
4 changes: 1 addition & 3 deletions net/rds/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@ rds-y := af_rds.o bind.o cong.o connection.o info.o message.o \
obj-$(CONFIG_RDS_RDMA) += rds_rdma.o
rds_rdma-y := rdma_transport.o \
ib.o ib_cm.o ib_recv.o ib_ring.o ib_send.o ib_stats.o \
ib_sysctl.o ib_rdma.o \
iw.o iw_cm.o iw_recv.o iw_ring.o iw_send.o iw_stats.o \
iw_sysctl.o iw_rdma.o
ib_sysctl.o ib_rdma.o ib_fmr.o ib_frmr.o


obj-$(CONFIG_RDS_TCP) += rds_tcp.o
Expand Down
26 changes: 26 additions & 0 deletions net/rds/af_rds.c
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,27 @@ static int rds_set_transport(struct rds_sock *rs, char __user *optval,
return rs->rs_transport ? 0 : -ENOPROTOOPT;
}

static int rds_enable_recvtstamp(struct sock *sk, char __user *optval,
int optlen)
{
int val, valbool;

if (optlen != sizeof(int))
return -EFAULT;

if (get_user(val, (int __user *)optval))
return -EFAULT;

valbool = val ? 1 : 0;

if (valbool)
sock_set_flag(sk, SOCK_RCVTSTAMP);
else
sock_reset_flag(sk, SOCK_RCVTSTAMP);

return 0;
}

static int rds_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
{
Expand Down Expand Up @@ -312,6 +333,11 @@ static int rds_setsockopt(struct socket *sock, int level, int optname,
ret = rds_set_transport(rs, optval, optlen);
release_sock(sock->sk);
break;
case SO_TIMESTAMP:
lock_sock(sock->sk);
ret = rds_enable_recvtstamp(sock->sk, optval, optlen);
release_sock(sock->sk);
break;
default:
ret = -ENOPROTOOPT;
}
Expand Down
47 changes: 29 additions & 18 deletions net/rds/ib.c
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,16 @@

#include "rds.h"
#include "ib.h"
#include "ib_mr.h"

unsigned int rds_ib_fmr_1m_pool_size = RDS_FMR_1M_POOL_SIZE;
unsigned int rds_ib_fmr_8k_pool_size = RDS_FMR_8K_POOL_SIZE;
unsigned int rds_ib_mr_1m_pool_size = RDS_MR_1M_POOL_SIZE;
unsigned int rds_ib_mr_8k_pool_size = RDS_MR_8K_POOL_SIZE;
unsigned int rds_ib_retry_count = RDS_IB_DEFAULT_RETRY_COUNT;

module_param(rds_ib_fmr_1m_pool_size, int, 0444);
MODULE_PARM_DESC(rds_ib_fmr_1m_pool_size, " Max number of 1M fmr per HCA");
module_param(rds_ib_fmr_8k_pool_size, int, 0444);
MODULE_PARM_DESC(rds_ib_fmr_8k_pool_size, " Max number of 8K fmr per HCA");
module_param(rds_ib_mr_1m_pool_size, int, 0444);
MODULE_PARM_DESC(rds_ib_mr_1m_pool_size, " Max number of 1M mr per HCA");
module_param(rds_ib_mr_8k_pool_size, int, 0444);
MODULE_PARM_DESC(rds_ib_mr_8k_pool_size, " Max number of 8K mr per HCA");
module_param(rds_ib_retry_count, int, 0444);
MODULE_PARM_DESC(rds_ib_retry_count, " Number of hw retries before reporting an error");

Expand Down Expand Up @@ -139,14 +140,20 @@ static void rds_ib_add_one(struct ib_device *device)
rds_ibdev->max_wrs = device->attrs.max_qp_wr;
rds_ibdev->max_sge = min(device->attrs.max_sge, RDS_IB_MAX_SGE);

rds_ibdev->has_fr = (device->attrs.device_cap_flags &
IB_DEVICE_MEM_MGT_EXTENSIONS);
rds_ibdev->has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
device->map_phys_fmr && device->unmap_fmr);
rds_ibdev->use_fastreg = (rds_ibdev->has_fr && !rds_ibdev->has_fmr);

rds_ibdev->fmr_max_remaps = device->attrs.max_map_per_fmr?: 32;
rds_ibdev->max_1m_fmrs = device->attrs.max_mr ?
rds_ibdev->max_1m_mrs = device->attrs.max_mr ?
min_t(unsigned int, (device->attrs.max_mr / 2),
rds_ib_fmr_1m_pool_size) : rds_ib_fmr_1m_pool_size;
rds_ib_mr_1m_pool_size) : rds_ib_mr_1m_pool_size;

rds_ibdev->max_8k_fmrs = device->attrs.max_mr ?
rds_ibdev->max_8k_mrs = device->attrs.max_mr ?
min_t(unsigned int, ((device->attrs.max_mr / 2) * RDS_MR_8K_SCALE),
rds_ib_fmr_8k_pool_size) : rds_ib_fmr_8k_pool_size;
rds_ib_mr_8k_pool_size) : rds_ib_mr_8k_pool_size;

rds_ibdev->max_initiator_depth = device->attrs.max_qp_init_rd_atom;
rds_ibdev->max_responder_resources = device->attrs.max_qp_rd_atom;
Expand All @@ -172,10 +179,14 @@ static void rds_ib_add_one(struct ib_device *device)
goto put_dev;
}

rdsdebug("RDS/IB: max_mr = %d, max_wrs = %d, max_sge = %d, fmr_max_remaps = %d, max_1m_fmrs = %d, max_8k_fmrs = %d\n",
rdsdebug("RDS/IB: max_mr = %d, max_wrs = %d, max_sge = %d, fmr_max_remaps = %d, max_1m_mrs = %d, max_8k_mrs = %d\n",
device->attrs.max_fmr, rds_ibdev->max_wrs, rds_ibdev->max_sge,
rds_ibdev->fmr_max_remaps, rds_ibdev->max_1m_fmrs,
rds_ibdev->max_8k_fmrs);
rds_ibdev->fmr_max_remaps, rds_ibdev->max_1m_mrs,
rds_ibdev->max_8k_mrs);

pr_info("RDS/IB: %s: %s supported and preferred\n",
device->name,
rds_ibdev->use_fastreg ? "FRMR" : "FMR");

INIT_LIST_HEAD(&rds_ibdev->ipaddr_list);
INIT_LIST_HEAD(&rds_ibdev->conn_list);
Expand Down Expand Up @@ -364,7 +375,7 @@ void rds_ib_exit(void)
rds_ib_sysctl_exit();
rds_ib_recv_exit();
rds_trans_unregister(&rds_ib_transport);
rds_ib_fmr_exit();
rds_ib_mr_exit();
}

struct rds_transport rds_ib_transport = {
Expand Down Expand Up @@ -400,13 +411,13 @@ int rds_ib_init(void)

INIT_LIST_HEAD(&rds_ib_devices);

ret = rds_ib_fmr_init();
ret = rds_ib_mr_init();
if (ret)
goto out;

ret = ib_register_client(&rds_ib_client);
if (ret)
goto out_fmr_exit;
goto out_mr_exit;

ret = rds_ib_sysctl_init();
if (ret)
Expand All @@ -430,8 +441,8 @@ int rds_ib_init(void)
rds_ib_sysctl_exit();
out_ibreg:
rds_ib_unregister_client();
out_fmr_exit:
rds_ib_fmr_exit();
out_mr_exit:
rds_ib_mr_exit();
out:
return ret;
}
Expand Down
37 changes: 14 additions & 23 deletions net/rds/ib.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,12 @@
#include "rds.h"
#include "rdma_transport.h"

#define RDS_FMR_1M_POOL_SIZE (8192 / 2)
#define RDS_FMR_1M_MSG_SIZE 256
#define RDS_FMR_8K_MSG_SIZE 2
#define RDS_MR_8K_SCALE (256 / (RDS_FMR_8K_MSG_SIZE + 1))
#define RDS_FMR_8K_POOL_SIZE (RDS_MR_8K_SCALE * (8192 / 2))

#define RDS_IB_MAX_SGE 8
#define RDS_IB_RECV_SGE 2

#define RDS_IB_DEFAULT_RECV_WR 1024
#define RDS_IB_DEFAULT_SEND_WR 256
#define RDS_IB_DEFAULT_FR_WR 512

#define RDS_IB_DEFAULT_RETRY_COUNT 2

Expand All @@ -28,7 +23,6 @@
#define RDS_IB_RECYCLE_BATCH_COUNT 32

#define RDS_IB_WC_MAX 32
#define RDS_IB_SEND_OP BIT_ULL(63)

extern struct rw_semaphore rds_ib_devices_lock;
extern struct list_head rds_ib_devices;
Expand Down Expand Up @@ -129,6 +123,9 @@ struct rds_ib_connection {
struct ib_wc i_send_wc[RDS_IB_WC_MAX];
struct ib_wc i_recv_wc[RDS_IB_WC_MAX];

/* To control the number of wrs from fastreg */
atomic_t i_fastreg_wrs;

/* interrupt handling */
struct tasklet_struct i_send_tasklet;
struct tasklet_struct i_recv_tasklet;
Expand Down Expand Up @@ -207,12 +204,16 @@ struct rds_ib_device {
struct list_head conn_list;
struct ib_device *dev;
struct ib_pd *pd;
unsigned int max_fmrs;
bool has_fmr;
bool has_fr;
bool use_fastreg;

unsigned int max_mrs;
struct rds_ib_mr_pool *mr_1m_pool;
struct rds_ib_mr_pool *mr_8k_pool;
unsigned int fmr_max_remaps;
unsigned int max_8k_fmrs;
unsigned int max_1m_fmrs;
unsigned int max_8k_mrs;
unsigned int max_1m_mrs;
int max_sge;
unsigned int max_wrs;
unsigned int max_initiator_depth;
Expand Down Expand Up @@ -266,6 +267,8 @@ struct rds_ib_statistics {
uint64_t s_ib_rdma_mr_1m_pool_flush;
uint64_t s_ib_rdma_mr_1m_pool_wait;
uint64_t s_ib_rdma_mr_1m_pool_depleted;
uint64_t s_ib_rdma_mr_8k_reused;
uint64_t s_ib_rdma_mr_1m_reused;
uint64_t s_ib_atomic_cswp;
uint64_t s_ib_atomic_fadd;
};
Expand Down Expand Up @@ -317,8 +320,6 @@ struct rds_ib_device *rds_ib_get_client_data(struct ib_device *device);
void rds_ib_dev_put(struct rds_ib_device *rds_ibdev);
extern struct ib_client rds_ib_client;

extern unsigned int rds_ib_fmr_1m_pool_size;
extern unsigned int rds_ib_fmr_8k_pool_size;
extern unsigned int rds_ib_retry_count;

extern spinlock_t ib_nodev_conns_lock;
Expand Down Expand Up @@ -348,17 +349,7 @@ int rds_ib_update_ipaddr(struct rds_ib_device *rds_ibdev, __be32 ipaddr);
void rds_ib_add_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
void rds_ib_remove_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
void rds_ib_destroy_nodev_conns(void);
struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_dev,
int npages);
void rds_ib_get_mr_info(struct rds_ib_device *rds_ibdev, struct rds_info_rdma_connection *iinfo);
void rds_ib_destroy_mr_pool(struct rds_ib_mr_pool *);
void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
struct rds_sock *rs, u32 *key_ret);
void rds_ib_sync_mr(void *trans_private, int dir);
void rds_ib_free_mr(void *trans_private, int invalidate);
void rds_ib_flush_mrs(void);
int rds_ib_fmr_init(void);
void rds_ib_fmr_exit(void);
void rds_ib_mr_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc);

/* ib_recv.c */
int rds_ib_recv_init(void);
Expand Down
Loading

0 comments on commit fcb3f55

Please sign in to comment.