Skip to content

Commit

Permalink
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kern…
Browse files Browse the repository at this point in the history
…el/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  IB/core: Re-enable create_flow/destroy_flow uverbs
  IB/core: extended command: an improved infrastructure for uverbs commands
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table
  IB/cm: Convert to using idr_alloc_cyclic()
  IB/mlx5: Fix page shift in create CQ for userspace
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix list_del of empty list
  IB/mlx5: Remove dead code
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx4: Fix endless loop in resize CQ
  RDMA/cma: Remove unused argument and minor dead code
  RDMA/ucma: Discard events for IDs not yet claimed by user space
  IB/core: Add Cisco usNIC rdma node and transport types
  RDMA/nes: Remove self-assignment from nes_query_qp()
  IB/srp: Report receive errors correctly
  ...
  • Loading branch information
Linus Torvalds committed Nov 18, 2013
2 parents a709bd5 + b4fdf52 commit 1ea406c
Show file tree
Hide file tree
Showing 52 changed files with 1,973 additions and 595 deletions.
13 changes: 13 additions & 0 deletions Documentation/ABI/stable/sysfs-driver-ib_srp
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ Description: Interface for making ib_srp connect to a new target.
interrupt is handled by a different CPU then the comp_vector
parameter can be used to spread the SRP completion workload
over multiple CPU's.
* tl_retry_count, a number in the range 2..7 specifying the
IB RC retry count.
* queue_size, the maximum number of commands that the
initiator is allowed to queue per SCSI host. The default
value for this parameter is 62. The lowest supported value
is 2.

What: /sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
Date: January 2, 2006
Expand Down Expand Up @@ -153,6 +159,13 @@ Contact: linux-rdma@vger.kernel.org
Description: InfiniBand service ID used for establishing communication with
the SRP target.

What: /sys/class/scsi_host/host<n>/sgid
Date: February 1, 2014
KernelVersion: 3.13
Contact: linux-rdma@vger.kernel.org
Description: InfiniBand GID of the source port used for communication with
the SRP target.

What: /sys/class/scsi_host/host<n>/zero_req_lim
Date: September 20, 2006
KernelVersion: 2.6.18
Expand Down
39 changes: 39 additions & 0 deletions Documentation/ABI/stable/sysfs-transport-srp
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,54 @@ Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
Description: Instructs an SRP initiator to disconnect from a target and to
remove all LUNs imported from that target.

What: /sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
Date: February 1, 2014
KernelVersion: 3.13
Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
Description: Number of seconds the SCSI layer will wait after a transport
layer error has been observed before removing a target port.
Zero means immediate removal. Setting this attribute to "off"
will disable the dev_loss timer.

What: /sys/class/srp_remote_ports/port-<h>:<n>/fast_io_fail_tmo
Date: February 1, 2014
KernelVersion: 3.13
Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
Description: Number of seconds the SCSI layer will wait after a transport
layer error has been observed before failing I/O. Zero means
failing I/O immediately. Setting this attribute to "off" will
disable the fast_io_fail timer.

What: /sys/class/srp_remote_ports/port-<h>:<n>/port_id
Date: June 27, 2007
KernelVersion: 2.6.24
Contact: linux-scsi@vger.kernel.org
Description: 16-byte local SRP port identifier in hexadecimal format. An
example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.

What: /sys/class/srp_remote_ports/port-<h>:<n>/reconnect_delay
Date: February 1, 2014
KernelVersion: 3.13
Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
Description: Number of seconds the SCSI layer will wait after a reconnect
attempt failed before retrying. Setting this attribute to
"off" will disable time-based reconnecting.

What: /sys/class/srp_remote_ports/port-<h>:<n>/roles
Date: June 27, 2007
KernelVersion: 2.6.24
Contact: linux-scsi@vger.kernel.org
Description: Role of the remote port. Either "SRP Initiator" or "SRP Target".

What: /sys/class/srp_remote_ports/port-<h>:<n>/state
Date: February 1, 2014
KernelVersion: 3.13
Contact: linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
Description: State of the transport layer used for communication with the
remote port. "running" if the transport layer is operational;
"blocked" if a transport layer error has been encountered but
the fast_io_fail_tmo timer has not yet fired; "fail-fast"
after the fast_io_fail_tmo timer has fired and before the
"dev_loss_tmo" timer has fired; "lost" after the
"dev_loss_tmo" timer has fired and before the port is finally
removed.
11 changes: 0 additions & 11 deletions drivers/infiniband/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,6 @@ config INFINIBAND_USER_ACCESS
libibverbs, libibcm and a hardware driver library from
<http://www.openfabrics.org/git/>.

config INFINIBAND_EXPERIMENTAL_UVERBS_FLOW_STEERING
bool "Experimental and unstable ABI for userspace access to flow steering verbs"
depends on INFINIBAND_USER_ACCESS
depends on STAGING
---help---
The final ABI for userspace access to flow steering verbs
has not been defined. To use the current ABI, *WHICH WILL
CHANGE IN THE FUTURE*, say Y here.

If unsure, say N.

config INFINIBAND_USER_MEM
bool
depends on INFINIBAND_USER_ACCESS != n
Expand Down
5 changes: 1 addition & 4 deletions drivers/infiniband/core/cm.c
Original file line number Diff line number Diff line change
Expand Up @@ -383,14 +383,11 @@ static int cm_alloc_id(struct cm_id_private *cm_id_priv)
{
unsigned long flags;
int id;
static int next_id;

idr_preload(GFP_KERNEL);
spin_lock_irqsave(&cm.lock, flags);

id = idr_alloc(&cm.local_id_table, cm_id_priv, next_id, 0, GFP_NOWAIT);
if (id >= 0)
next_id = max(id + 1, 0);
id = idr_alloc_cyclic(&cm.local_id_table, cm_id_priv, 0, 0, GFP_NOWAIT);

spin_unlock_irqrestore(&cm.lock, flags);
idr_preload_end();
Expand Down
68 changes: 33 additions & 35 deletions drivers/infiniband/core/cma.c
Original file line number Diff line number Diff line change
Expand Up @@ -328,28 +328,6 @@ static int cma_set_qkey(struct rdma_id_private *id_priv, u32 qkey)
return ret;
}

static int find_gid_port(struct ib_device *device, union ib_gid *gid, u8 port_num)
{
int i;
int err;
struct ib_port_attr props;
union ib_gid tmp;

err = ib_query_port(device, port_num, &props);
if (err)
return err;

for (i = 0; i < props.gid_tbl_len; ++i) {
err = ib_query_gid(device, port_num, i, &tmp);
if (err)
return err;
if (!memcmp(&tmp, gid, sizeof tmp))
return 0;
}

return -EADDRNOTAVAIL;
}

static void cma_translate_ib(struct sockaddr_ib *sib, struct rdma_dev_addr *dev_addr)
{
dev_addr->dev_type = ARPHRD_INFINIBAND;
Expand All @@ -371,13 +349,14 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a
return ret;
}

static int cma_acquire_dev(struct rdma_id_private *id_priv)
static int cma_acquire_dev(struct rdma_id_private *id_priv,
struct rdma_id_private *listen_id_priv)
{
struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
struct cma_device *cma_dev;
union ib_gid gid, iboe_gid;
int ret = -ENODEV;
u8 port;
u8 port, found_port;
enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ?
IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;

Expand All @@ -389,17 +368,39 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
iboe_addr_get_sgid(dev_addr, &iboe_gid);
memcpy(&gid, dev_addr->src_dev_addr +
rdma_addr_gid_offset(dev_addr), sizeof gid);
if (listen_id_priv &&
rdma_port_get_link_layer(listen_id_priv->id.device,
listen_id_priv->id.port_num) == dev_ll) {
cma_dev = listen_id_priv->cma_dev;
port = listen_id_priv->id.port_num;
if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
&found_port, NULL);
else
ret = ib_find_cached_gid(cma_dev->device, &gid,
&found_port, NULL);

if (!ret && (port == found_port)) {
id_priv->id.port_num = found_port;
goto out;
}
}
list_for_each_entry(cma_dev, &dev_list, list) {
for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) {
if (listen_id_priv &&
listen_id_priv->cma_dev == cma_dev &&
listen_id_priv->id.port_num == port)
continue;
if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) {
if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
ret = find_gid_port(cma_dev->device, &iboe_gid, port);
ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL);
else
ret = find_gid_port(cma_dev->device, &gid, port);
ret = ib_find_cached_gid(cma_dev->device, &gid, &found_port, NULL);

if (!ret) {
id_priv->id.port_num = port;
if (!ret && (port == found_port)) {
id_priv->id.port_num = found_port;
goto out;
}
}
Expand Down Expand Up @@ -1292,7 +1293,7 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
}

mutex_lock_nested(&conn_id->handler_mutex, SINGLE_DEPTH_NESTING);
ret = cma_acquire_dev(conn_id);
ret = cma_acquire_dev(conn_id, listen_id);
if (ret)
goto err2;

Expand Down Expand Up @@ -1451,7 +1452,6 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
{
struct rdma_cm_id *new_cm_id;
struct rdma_id_private *listen_id, *conn_id;
struct net_device *dev = NULL;
struct rdma_cm_event event;
int ret;
struct ib_device_attr attr;
Expand Down Expand Up @@ -1481,7 +1481,7 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
goto out;
}

ret = cma_acquire_dev(conn_id);
ret = cma_acquire_dev(conn_id, listen_id);
if (ret) {
mutex_unlock(&conn_id->handler_mutex);
rdma_destroy_id(new_cm_id);
Expand Down Expand Up @@ -1529,8 +1529,6 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
cma_deref_id(conn_id);

out:
if (dev)
dev_put(dev);
mutex_unlock(&listen_id->handler_mutex);
return ret;
}
Expand Down Expand Up @@ -2066,7 +2064,7 @@ static void addr_handler(int status, struct sockaddr *src_addr,
goto out;

if (!status && !id_priv->cma_dev)
status = cma_acquire_dev(id_priv);
status = cma_acquire_dev(id_priv, NULL);

if (status) {
if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED,
Expand Down Expand Up @@ -2563,7 +2561,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
if (ret)
goto err1;

ret = cma_acquire_dev(id_priv);
ret = cma_acquire_dev(id_priv, NULL);
if (ret)
goto err1;
}
Expand Down
2 changes: 1 addition & 1 deletion drivers/infiniband/core/netlink.c
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ static int ibnl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
list_for_each_entry(client, &client_list, list) {
if (client->index == index) {
if (op < 0 || op >= client->nops ||
!client->cb_table[RDMA_NL_GET_OP(op)].dump)
!client->cb_table[op].dump)
return -EINVAL;

{
Expand Down
1 change: 1 addition & 0 deletions drivers/infiniband/core/sysfs.c
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,7 @@ static ssize_t show_node_type(struct device *device,
switch (dev->node_type) {
case RDMA_NODE_IB_CA: return sprintf(buf, "%d: CA\n", dev->node_type);
case RDMA_NODE_RNIC: return sprintf(buf, "%d: RNIC\n", dev->node_type);
case RDMA_NODE_USNIC: return sprintf(buf, "%d: usNIC\n", dev->node_type);
case RDMA_NODE_IB_SWITCH: return sprintf(buf, "%d: switch\n", dev->node_type);
case RDMA_NODE_IB_ROUTER: return sprintf(buf, "%d: router\n", dev->node_type);
default: return sprintf(buf, "%d: <unknown>\n", dev->node_type);
Expand Down
4 changes: 2 additions & 2 deletions drivers/infiniband/core/ucma.c
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ MODULE_LICENSE("Dual BSD/GPL");
static unsigned int max_backlog = 1024;

static struct ctl_table_header *ucma_ctl_table_hdr;
static ctl_table ucma_ctl_table[] = {
static struct ctl_table ucma_ctl_table[] = {
{
.procname = "max_backlog",
.data = &max_backlog,
Expand Down Expand Up @@ -271,7 +271,7 @@ static int ucma_event_handler(struct rdma_cm_id *cm_id,
goto out;
}
ctx->backlog--;
} else if (!ctx->uid) {
} else if (!ctx->uid || ctx->cm_id != cm_id) {
/*
* We ignore events for new connections until userspace has set
* their context. This can only happen if an error occurs on a
Expand Down
36 changes: 32 additions & 4 deletions drivers/infiniband/core/uverbs.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@
#include <rdma/ib_umem.h>
#include <rdma/ib_user_verbs.h>

#define INIT_UDATA(udata, ibuf, obuf, ilen, olen) \
do { \
(udata)->inbuf = (void __user *) (ibuf); \
(udata)->outbuf = (void __user *) (obuf); \
(udata)->inlen = (ilen); \
(udata)->outlen = (olen); \
} while (0)

/*
* Our lifetime rules for these structs are the following:
*
Expand Down Expand Up @@ -178,6 +186,22 @@ void ib_uverbs_event_handler(struct ib_event_handler *handler,
struct ib_event *event);
void ib_uverbs_dealloc_xrcd(struct ib_uverbs_device *dev, struct ib_xrcd *xrcd);

struct ib_uverbs_flow_spec {
union {
union {
struct ib_uverbs_flow_spec_hdr hdr;
struct {
__u32 type;
__u16 size;
__u16 reserved;
};
};
struct ib_uverbs_flow_spec_eth eth;
struct ib_uverbs_flow_spec_ipv4 ipv4;
struct ib_uverbs_flow_spec_tcp_udp tcp_udp;
};
};

#define IB_UVERBS_DECLARE_CMD(name) \
ssize_t ib_uverbs_##name(struct ib_uverbs_file *file, \
const char __user *buf, int in_len, \
Expand Down Expand Up @@ -217,9 +241,13 @@ IB_UVERBS_DECLARE_CMD(destroy_srq);
IB_UVERBS_DECLARE_CMD(create_xsrq);
IB_UVERBS_DECLARE_CMD(open_xrcd);
IB_UVERBS_DECLARE_CMD(close_xrcd);
#ifdef CONFIG_INFINIBAND_EXPERIMENTAL_UVERBS_FLOW_STEERING
IB_UVERBS_DECLARE_CMD(create_flow);
IB_UVERBS_DECLARE_CMD(destroy_flow);
#endif /* CONFIG_INFINIBAND_EXPERIMENTAL_UVERBS_FLOW_STEERING */

#define IB_UVERBS_DECLARE_EX_CMD(name) \
int ib_uverbs_ex_##name(struct ib_uverbs_file *file, \
struct ib_udata *ucore, \
struct ib_udata *uhw)

IB_UVERBS_DECLARE_EX_CMD(create_flow);
IB_UVERBS_DECLARE_EX_CMD(destroy_flow);

#endif /* UVERBS_H */
Loading

0 comments on commit 1ea406c

Please sign in to comment.