Skip to content

Commit

Permalink
IB/mthca: Don't allow userspace open while recovering from catastroph…
Browse files Browse the repository at this point in the history
…ic error

Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.

However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event.  In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.

This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
  • Loading branch information
Jack Morgenstein authored and Roland Dreier committed Sep 6, 2009
1 parent d94a868 commit d841064
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 0 deletions.
1 change: 1 addition & 0 deletions drivers/infiniband/hw/mthca/mthca_catas.c
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ static void handle_catas(struct mthca_dev *dev)
event.device = &dev->ib_dev;
event.event = IB_EVENT_DEVICE_FATAL;
event.element.port_num = 0;
dev->active = false;

ib_dispatch_event(&event);

Expand Down
1 change: 1 addition & 0 deletions drivers/infiniband/hw/mthca/mthca_dev.h
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,7 @@ struct mthca_dev {
struct ib_ah *sm_ah[MTHCA_MAX_PORTS];
spinlock_t sm_lock;
u8 rate[MTHCA_MAX_PORTS];
bool active;
};

#ifdef CONFIG_INFINIBAND_MTHCA_DEBUG
Expand Down
2 changes: 2 additions & 0 deletions drivers/infiniband/hw/mthca/mthca_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -1116,6 +1116,8 @@ static int __mthca_init_one(struct pci_dev *pdev, int hca_type)
pci_set_drvdata(pdev, mdev);
mdev->hca_type = hca_type;

mdev->active = true;

return 0;

err_unregister:
Expand Down
3 changes: 3 additions & 0 deletions drivers/infiniband/hw/mthca/mthca_provider.c
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,9 @@ static struct ib_ucontext *mthca_alloc_ucontext(struct ib_device *ibdev,
struct mthca_ucontext *context;
int err;

if (!(to_mdev(ibdev)->active))
return ERR_PTR(-EAGAIN);

memset(&uresp, 0, sizeof uresp);

uresp.qp_tab_size = to_mdev(ibdev)->limits.num_qps;
Expand Down

0 comments on commit d841064

Please sign in to comment.