Skip to content

Commit

Permalink
IPoIB: Fix send lockup due to missed TX completion
Browse files Browse the repository at this point in the history
Commit f0dc117 ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
  • Loading branch information
Mike Marciniszyn authored and Roland Dreier committed Mar 23, 2013
1 parent a937536 commit 1ee9e2a
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions drivers/infiniband/ulp/ipoib/ipoib_cm.c
Original file line number Diff line number Diff line change
Expand Up @@ -758,9 +758,13 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
if (++priv->tx_outstanding == ipoib_sendq_size) {
ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
tx->qp->qp_num);
if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP))
ipoib_warn(priv, "request notify on send CQ failed\n");
netif_stop_queue(dev);
rc = ib_req_notify_cq(priv->send_cq,
IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
if (rc < 0)
ipoib_warn(priv, "request notify on send CQ failed\n");
else if (rc)
ipoib_send_comp_handler(priv->send_cq, dev);
}
}
}
Expand Down

0 comments on commit 1ee9e2a

Please sign in to comment.