Skip to content

Commit

Permalink
[SCSI] zfcp: Recover from stalled outbound queue
Browse files Browse the repository at this point in the history
Depending on interruptions on some storage systems, the complete
channel can stall which looks like an outbound queue stall to Linux.
When trying to acquire a free SBAL for a non-SCSI command, zfcp waits
for 5 seconds for a free slot to appear. This is the right place to
detect a queue stall: If the wait times out, we assume a stalled queue
and try to recover this.

The overall strategy should be to trigger the erp from specific
events, and not try an overall escalation from one failed port to a
full-blown queue recovery. If we manage to send a command, the status
codes for this command or a timeout will trigger the right follow-on
actions.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
  • Loading branch information
Christof Schmitt authored and James Bottomley committed Jul 30, 2009
1 parent 85600f7 commit cbf1ed0
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion drivers/s390/scsi/zfcp_fsf.c
Original file line number Diff line number Diff line change
Expand Up @@ -670,8 +670,11 @@ static int zfcp_fsf_req_sbal_get(struct zfcp_adapter *adapter)
zfcp_fsf_sbal_check(adapter), 5 * HZ);
if (ret > 0)
return 0;
if (!ret)
if (!ret) {
atomic_inc(&adapter->qdio_outb_full);
/* assume hanging outbound queue, try queue recovery */
zfcp_erp_adapter_reopen(adapter, 0, "fsrsg_1", NULL);
}

spin_lock_bh(&adapter->req_q_lock);
return -EIO;
Expand Down

0 comments on commit cbf1ed0

Please sign in to comment.