Skip to content

Commit

Permalink
RDMA/nes: Fix nes_nic_cm_xmit() error handling
Browse files Browse the repository at this point in the history
We are getting crash or hung situation when we are running network
cable pull tests during RDMA traffic.

In schedule_nes_timer(), we return an error if nes_nic_cm_xmit()
returns failure.  This is changed to success as skb is being put on
the timer routines to be processed later.  In send_syn() case, we are
indicating connect failure once from nes_connect() and the other when
the rexmit retries expires.

The other issue is skb->users which we are incrementing before calling
nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure
we are decrementing the skb->users at the same time putting the skb on
the rexmit path.  Even if dev_queue_xmit() fails, the skb->users is
decremented already.  We are removing the decrement of skb->users in
case of failure from both schedule_nes_timer() as well as from
nes_cm_timer_tick().

There is also extra check in nes_cm_timer_tick() for rexmit failure
which does a break from the loop is removed.  This causes problem as
the other nodes have their cm_node->ref_count incremented and are not
processed.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
  • Loading branch information
Faisal Latif authored and Roland Dreier committed Apr 8, 2009
1 parent 79fc3d7 commit 5962c2c
Showing 1 changed file with 1 addition and 7 deletions.
8 changes: 1 addition & 7 deletions drivers/infiniband/hw/nes/nes_cm.c
Original file line number Diff line number Diff line change
Expand Up @@ -446,8 +446,8 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb,
if (ret != NETDEV_TX_OK) {
nes_debug(NES_DBG_CM, "Error sending packet %p "
"(jiffies = %lu)\n", new_send, jiffies);
atomic_dec(&new_send->skb->users);
new_send->timetosend = jiffies;
ret = NETDEV_TX_OK;
} else {
cm_packets_sent++;
if (!send_retrans) {
Expand Down Expand Up @@ -631,7 +631,6 @@ static void nes_cm_timer_tick(unsigned long pass)
nes_debug(NES_DBG_CM, "rexmit failed for "
"node=%p\n", cm_node);
cm_packets_bounced++;
atomic_dec(&send_entry->skb->users);
send_entry->retrycount--;
nexttimeout = jiffies + NES_SHORT_TIME;
settimer = 1;
Expand Down Expand Up @@ -667,11 +666,6 @@ static void nes_cm_timer_tick(unsigned long pass)

spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags);
rem_ref_cm_node(cm_node->cm_core, cm_node);
if (ret != NETDEV_TX_OK) {
nes_debug(NES_DBG_CM, "rexmit failed for cm_node=%p\n",
cm_node);
break;
}
}

if (settimer) {
Expand Down

0 comments on commit 5962c2c

Please sign in to comment.