Skip to content

Commit

Permalink
[SCSI] fc transport: resolve scan vs delete deadlocks
Browse files Browse the repository at this point in the history
In a prior posting to linux-scsi on the fc transport and workq
deadlocks, we noted a second error that did not have a patch:
  http://marc.theaimsgroup.com/?l=linux-scsi&m=114467847711383&w=2
  - There's a deadlock where scsi_remove_target() has to sit behind
    scsi_scan_target() due to contention over the scan_lock().

Subsequently we posted a request for comments about the deadlock:
  http://marc.theaimsgroup.com/?l=linux-scsi&m=114469358829500&w=2

This posting resolves the second error. Here's what we now understand,
and are implementing:

  If the lldd deletes the rport while a scan is active, the sdev's queue
  is blocked which stops the issuing of commands associated with the scan.
  At this point, the scan stalls, and does so with the shost->scan_mutex held.
  If, at this point, if any scan or delete request is made on the host, it
  will stall waiting for the scan_mutex.

  For the FC transport, we queue all delete work to a single workq.
  So, things worked fine when competing with the scan, as long as the
  target blocking the scan was the same target at the top of our delete
  workq, as the delete workq routine always unblocked just prior to
  requesting the delete.  Unfortunately, if the top of our delete workq
  was for a different target, we deadlock.  Additionally, if the target
  blocking scan returned, we were unblocking it in the scan workq routine,
  which really won't execute until the existing stalled scan workq
  completes (e.g. we're re-scheduling it while it is in the midst of its
  execution).

  This patch moves the unblock out of the workq routines and moves it to
  the context that is scheduling the work. This ensures that at some point,
  we will unblock the target that is blocking scan.  Please note, however,
  that the deadlock condition may still occur while it waits for the
  transport to timeout an unblock on a target.  Worst case, this is bounded
  by the transport dev_loss_tmo (default: 30 seconds).

Finally, Michael Reed deserves the credit for the bulk of this patch,
analysis, and it's testing. Thank you for your help.

Note: The request for comments statements about the gross-ness of the
  scan_mutex still stand.

Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
  • Loading branch information
James Smart authored and James Bottomley committed Jun 27, 2006
1 parent 79ac674 commit a0785ed
Showing 1 changed file with 15 additions and 13 deletions.
28 changes: 15 additions & 13 deletions drivers/scsi/scsi_transport_fc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1284,7 +1284,9 @@ EXPORT_SYMBOL(fc_release_transport);
* @work: Work to queue for execution.
*
* Return value:
* 0 on success / != 0 for error
* 1 - work queued for execution
* 0 - work is already queued
* -EINVAL - work queue doesn't exist
**/
static int
fc_queue_work(struct Scsi_Host *shost, struct work_struct *work)
Expand Down Expand Up @@ -1434,8 +1436,6 @@ fc_starget_delete(void *data)
struct Scsi_Host *shost = rport_to_shost(rport);
unsigned long flags;

scsi_target_unblock(&rport->dev);

spin_lock_irqsave(shost->host_lock, flags);
if (rport->flags & FC_RPORT_DEVLOSS_PENDING) {
spin_unlock_irqrestore(shost->host_lock, flags);
Expand Down Expand Up @@ -1707,6 +1707,8 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel,

spin_unlock_irqrestore(shost->host_lock, flags);

scsi_target_unblock(&rport->dev);

return rport;
}
}
Expand Down Expand Up @@ -1762,9 +1764,10 @@ fc_remote_port_add(struct Scsi_Host *shost, int channel,
/* initiate a scan of the target */
rport->flags |= FC_RPORT_SCAN_PENDING;
scsi_queue_work(shost, &rport->scan_work);
}

spin_unlock_irqrestore(shost->host_lock, flags);
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_target_unblock(&rport->dev);
} else
spin_unlock_irqrestore(shost->host_lock, flags);

return rport;
}
Expand Down Expand Up @@ -1938,6 +1941,7 @@ fc_remote_port_rolechg(struct fc_rport *rport, u32 roles)
rport->flags |= FC_RPORT_SCAN_PENDING;
scsi_queue_work(shost, &rport->scan_work);
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_target_unblock(&rport->dev);
}
}
EXPORT_SYMBOL(fc_remote_port_rolechg);
Expand Down Expand Up @@ -1970,8 +1974,9 @@ fc_timeout_deleted_rport(void *data)
dev_printk(KERN_ERR, &rport->dev,
"blocked FC remote port time out: no longer"
" a FCP target, removing starget\n");
fc_queue_work(shost, &rport->stgt_delete_work);
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_target_unblock(&rport->dev);
fc_queue_work(shost, &rport->stgt_delete_work);
return;
}

Expand Down Expand Up @@ -2035,17 +2040,15 @@ fc_timeout_deleted_rport(void *data)
* went away and didn't come back - we'll remove
* all attached scsi devices.
*/
fc_queue_work(shost, &rport->stgt_delete_work);

spin_unlock_irqrestore(shost->host_lock, flags);

scsi_target_unblock(&rport->dev);
fc_queue_work(shost, &rport->stgt_delete_work);
}

/**
* fc_scsi_scan_rport - called to perform a scsi scan on a remote port.
*
* Will unblock the target (in case it went away and has now come back),
* then invoke a scan.
*
* @data: remote port to be scanned.
**/
static void
Expand All @@ -2057,7 +2060,6 @@ fc_scsi_scan_rport(void *data)

if ((rport->port_state == FC_PORTSTATE_ONLINE) &&
(rport->roles & FC_RPORT_ROLE_FCP_TARGET)) {
scsi_target_unblock(&rport->dev);
scsi_scan_target(&rport->dev, rport->channel,
rport->scsi_target_id, SCAN_WILD_CARD, 1);
}
Expand Down

0 comments on commit a0785ed

Please sign in to comment.