NFS: Slow down state manager after an unhandled error

If the state manager thread is not actually able to fully recover from some situation, it wakes up waiters, who kick off a new state manager thread. Quite often the fresh invocation of the state manager is just as successful. This results in a livelock as the client dumps thousands of NFS requests a second on the network in a vain attempt to recover. Not very friendly. To mitigate this situation, add a delay in the state manager after an unhandled error, so that the client sends just a few requests every second in this case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
mariux64 · Oct 1, 2012 · ffe5a83 · ffe5a83
1 parent 8cb7f74
commit ffe5a83
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
@@ -2015,6 +2015,7 @@ static void nfs4_state_manager(struct nfs_client *clp)
 	pr_warn_ratelimited("NFS: state manager%s%s failed on NFSv4 server %s"
 			" with error %d\n", section_sep, section,
 			clp->cl_hostname, -status);
+	ssleep(1);
 	nfs4_end_drain_session(clp);
 	nfs4_clear_state_manager_bit(clp);
 }