Skip to content

Commit

Permalink
ocfs2/dlm: Clear joining_node on hearbeat node down
Browse files Browse the repository at this point in the history
Currently the process of dlm join contains 2 steps: query join and assert join.
After query join, the joined node will set its joining_node. So if the joining
node happens to panic before the 2nd step, the joined node will fail to clear
its joining_node flag because that node isn't in the domain map. It at least
cause 2 problems.
1. All the new join request will fail. So no new node can mount the volume.
2. The joined node can't umount the volume since during the umount process it
   has to wait for the joining_node to be unknown. So the umount will be hanged.

The solution is to clear the joining_node before we check the domain map.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
  • Loading branch information
Tao Ma authored and Mark Fasheh committed Jan 25, 2008
1 parent 4092d49 commit 2d4b1cb
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions fs/ocfs2/dlm/dlmrecovery.c
Original file line number Diff line number Diff line change
Expand Up @@ -2270,6 +2270,12 @@ static void __dlm_hb_node_down(struct dlm_ctxt *dlm, int idx)
}
}

/* Clean up join state on node death. */
if (dlm->joining_node == idx) {
mlog(0, "Clearing join state for node %u\n", idx);
__dlm_set_joining_node(dlm, DLM_LOCK_RES_OWNER_UNKNOWN);
}

/* check to see if the node is already considered dead */
if (!test_bit(idx, dlm->live_nodes_map)) {
mlog(0, "for domain %s, node %d is already dead. "
Expand All @@ -2288,12 +2294,6 @@ static void __dlm_hb_node_down(struct dlm_ctxt *dlm, int idx)

clear_bit(idx, dlm->live_nodes_map);

/* Clean up join state on node death. */
if (dlm->joining_node == idx) {
mlog(0, "Clearing join state for node %u\n", idx);
__dlm_set_joining_node(dlm, DLM_LOCK_RES_OWNER_UNKNOWN);
}

/* make sure local cleanup occurs before the heartbeat events */
if (!test_bit(idx, dlm->recovery_map))
dlm_do_local_recovery_cleanup(dlm, idx);
Expand Down

0 comments on commit 2d4b1cb

Please sign in to comment.