Skip to content

Commit

Permalink
[DLM] fix stopping unstarted recovery
Browse files Browse the repository at this point in the history
Red Hat BZ 211914

When many nodes are joining a lockspace simultaneously, the dlm gets a
quick sequence of stop/start events, a pair for adding each node.
dlm_controld in user space sends dlm_recoverd in the kernel each stop and
start event.  dlm_controld will sometimes send the stop before
dlm_recoverd has had a chance to take up the previously queued start.  The
stop aborts the processing of the previous start by setting the
RECOVERY_STOP flag.  dlm_recoverd is erroneously clearing this flag and
ignoring the stop/abort if it happens to take up the start after the stop
meant to abort it.  The fix is to check the sequence number that's
incremented for each stop/start before clearing the flag.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
  • Loading branch information
David Teigland authored and Steven Whitehouse committed Nov 30, 2006
1 parent 91c0dc9 commit 2cdc98a
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion fs/dlm/recoverd.c
Original file line number Diff line number Diff line change
Expand Up @@ -219,14 +219,19 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv)
return error;
}

/* The dlm_ls_start() that created the rv we take here may already have been
stopped via dlm_ls_stop(); in that case we need to leave the RECOVERY_STOP
flag set. */

static void do_ls_recovery(struct dlm_ls *ls)
{
struct dlm_recover *rv = NULL;

spin_lock(&ls->ls_recover_lock);
rv = ls->ls_recover_args;
ls->ls_recover_args = NULL;
clear_bit(LSFL_RECOVERY_STOP, &ls->ls_flags);
if (rv && ls->ls_recover_seq == rv->seq)
clear_bit(LSFL_RECOVERY_STOP, &ls->ls_flags);
spin_unlock(&ls->ls_recover_lock);

if (rv) {
Expand Down

0 comments on commit 2cdc98a

Please sign in to comment.