Skip to content

Commit

Permalink
xfs: introduce inode cluster buffer trylocks for xfs_iflush
Browse files Browse the repository at this point in the history
There is an ABBA deadlock between synchronous inode flushing in
xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
buffer, then takes inode ilocks, whilst synchronous reclaim takes
the ilock followed by the buffer lock in xfs_iflush().

To avoid this deadlock, separate the inode cluster buffer locking
semantics from the synchronous inode flush semantics, allowing
callers to attempt to lock the buffer but still issue synchronous IO
if it can get the buffer. This requires xfs_iflush() calls that
currently use non-blocking semantics to pass SYNC_TRYLOCK rather
than 0 as the flags parameter.

This allows xfs_reclaim_inode to avoid the deadlock on the buffer
lock and detect the failure so that it can drop the inode ilock and
restart the reclaim attempt on the inode. This allows
xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
release it and hence defuse the deadlock situation. It also has the
pleasant side effect of avoiding IO in xfs_reclaim_inode when it
tries to next reclaim the inode as it is now marked stale.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
  • Loading branch information
Dave Chinner authored and Dave Chinner committed Mar 25, 2011
1 parent a19fb38 commit 1bfd8d0
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 8 deletions.
2 changes: 1 addition & 1 deletion fs/xfs/linux-2.6/xfs_super.c
Original file line number Diff line number Diff line change
Expand Up @@ -1078,7 +1078,7 @@ xfs_fs_write_inode(
error = 0;
goto out_unlock;
}
error = xfs_iflush(ip, 0);
error = xfs_iflush(ip, SYNC_TRYLOCK);
}

out_unlock:
Expand Down
30 changes: 27 additions & 3 deletions fs/xfs/linux-2.6/xfs_sync.c
Original file line number Diff line number Diff line change
Expand Up @@ -761,8 +761,10 @@ xfs_reclaim_inode(
struct xfs_perag *pag,
int sync_mode)
{
int error = 0;
int error;

restart:
error = 0;
xfs_ilock(ip, XFS_ILOCK_EXCL);
if (!xfs_iflock_nowait(ip)) {
if (!(sync_mode & SYNC_WAIT))
Expand All @@ -788,9 +790,31 @@ xfs_reclaim_inode(
if (xfs_inode_clean(ip))
goto reclaim;

/* Now we have an inode that needs flushing */
error = xfs_iflush(ip, sync_mode);
/*
* Now we have an inode that needs flushing.
*
* We do a nonblocking flush here even if we are doing a SYNC_WAIT
* reclaim as we can deadlock with inode cluster removal.
* xfs_ifree_cluster() can lock the inode buffer before it locks the
* ip->i_lock, and we are doing the exact opposite here. As a result,
* doing a blocking xfs_itobp() to get the cluster buffer will result
* in an ABBA deadlock with xfs_ifree_cluster().
*
* As xfs_ifree_cluser() must gather all inodes that are active in the
* cache to mark them stale, if we hit this case we don't actually want
* to do IO here - we want the inode marked stale so we can simply
* reclaim it. Hence if we get an EAGAIN error on a SYNC_WAIT flush,
* just unlock the inode, back off and try again. Hopefully the next
* pass through will see the stale flag set on the inode.
*/
error = xfs_iflush(ip, SYNC_TRYLOCK | sync_mode);
if (sync_mode & SYNC_WAIT) {
if (error == EAGAIN) {
xfs_iunlock(ip, XFS_ILOCK_EXCL);
/* backoff longer than in xfs_ifree_cluster */
delay(2);
goto restart;
}
xfs_iflock(ip);
goto reclaim;
}
Expand Down
2 changes: 1 addition & 1 deletion fs/xfs/xfs_inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -2835,7 +2835,7 @@ xfs_iflush(
* Get the buffer containing the on-disk inode.
*/
error = xfs_itobp(mp, NULL, ip, &dip, &bp,
(flags & SYNC_WAIT) ? XBF_LOCK : XBF_TRYLOCK);
(flags & SYNC_TRYLOCK) ? XBF_TRYLOCK : XBF_LOCK);
if (error || !bp) {
xfs_ifunlock(ip);
return error;
Expand Down
6 changes: 3 additions & 3 deletions fs/xfs/xfs_inode_item.c
Original file line number Diff line number Diff line change
Expand Up @@ -760,11 +760,11 @@ xfs_inode_item_push(
* Push the inode to it's backing buffer. This will not remove the
* inode from the AIL - a further push will be required to trigger a
* buffer push. However, this allows all the dirty inodes to be pushed
* to the buffer before it is pushed to disk. THe buffer IO completion
* will pull th einode from the AIL, mark it clean and unlock the flush
* to the buffer before it is pushed to disk. The buffer IO completion
* will pull the inode from the AIL, mark it clean and unlock the flush
* lock.
*/
(void) xfs_iflush(ip, 0);
(void) xfs_iflush(ip, SYNC_TRYLOCK);
xfs_iunlock(ip, XFS_ILOCK_SHARED);
}

Expand Down

0 comments on commit 1bfd8d0

Please sign in to comment.