Skip to content

Commit

Permalink
IB/sa: Fail requests made while creating new SM AH
Browse files Browse the repository at this point in the history
This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH).  When SM AH
becomes invalid and needs an update it is handled by the global
workqueue.  On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins.  Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.

This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.

The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.

For consumers, the patch doesn't make things worse.  Before the patch,
MADs are sent to the wrong SM so the request gets lost.  Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
  • Loading branch information
Moni Shoua authored and Roland Dreier committed Jul 15, 2008
1 parent a947491 commit 164ba08
Showing 1 changed file with 16 additions and 6 deletions.
22 changes: 16 additions & 6 deletions drivers/infiniband/core/sa_query.c
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,7 @@ static void update_sm_ah(struct work_struct *work)
{
struct ib_sa_port *port =
container_of(work, struct ib_sa_port, update_task);
struct ib_sa_sm_ah *new_ah, *old_ah;
struct ib_sa_sm_ah *new_ah;
struct ib_port_attr port_attr;
struct ib_ah_attr ah_attr;

Expand Down Expand Up @@ -397,12 +397,9 @@ static void update_sm_ah(struct work_struct *work)
}

spin_lock_irq(&port->ah_lock);
old_ah = port->sm_ah;
port->sm_ah = new_ah;
spin_unlock_irq(&port->ah_lock);

if (old_ah)
kref_put(&old_ah->ref, free_sm_ah);
}

static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event)
Expand All @@ -413,8 +410,17 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event
event->event == IB_EVENT_PKEY_CHANGE ||
event->event == IB_EVENT_SM_CHANGE ||
event->event == IB_EVENT_CLIENT_REREGISTER) {
struct ib_sa_device *sa_dev;
sa_dev = container_of(handler, typeof(*sa_dev), event_handler);
unsigned long flags;
struct ib_sa_device *sa_dev =
container_of(handler, typeof(*sa_dev), event_handler);
struct ib_sa_port *port =
&sa_dev->port[event->element.port_num - sa_dev->start_port];

spin_lock_irqsave(&port->ah_lock, flags);
if (port->sm_ah)
kref_put(&port->sm_ah->ref, free_sm_ah);
port->sm_ah = NULL;
spin_unlock_irqrestore(&port->ah_lock, flags);

schedule_work(&sa_dev->port[event->element.port_num -
sa_dev->start_port].update_task);
Expand Down Expand Up @@ -519,6 +525,10 @@ static int alloc_mad(struct ib_sa_query *query, gfp_t gfp_mask)
unsigned long flags;

spin_lock_irqsave(&query->port->ah_lock, flags);
if (!query->port->sm_ah) {
spin_unlock_irqrestore(&query->port->ah_lock, flags);
return -EAGAIN;
}
kref_get(&query->port->sm_ah->ref);
query->sm_ah = query->port->sm_ah;
spin_unlock_irqrestore(&query->port->ah_lock, flags);
Expand Down

0 comments on commit 164ba08

Please sign in to comment.