Skip to content

Commit

Permalink
sched: fix wait_for_completion_timeout() spurious failure under heavy…
Browse files Browse the repository at this point in the history
… load

It seems that the current implementaton of wait_for_completion_timeout()
has a small problem under very high load for the common pattern:

	if (!wait_for_completion_timeout(&done, timeout))
		/* handle failure */

because the implementation very roughly does (lots of code deleted to
show the basic flow):

	static inline long __sched
	do_wait_for_common(struct completion *x, long timeout, int state)
	{
		if (x->done)
			return timeout;

		do {
			timeout = schedule_timeout(timeout);

			if (!timeout)
				return timeout;

		} while (!x->done);

		return timeout;
	}

so if the system is very busy and x->done is not set when
do_wait_for_common() is entered, it is possible that the first call to
schedule_timeout() returns 0 because the task doing wait_for_completion
doesn't get rescheduled for a long time, even if it is woken up early
enough.

In this case, wait_for_completion_timeout() returns 0 without even
checking x->done again, and the code above falls into its failure case
purely for scheduler reasons, even if the hardware event or whatever was
being waited for happened early enough.

It would make sense to add an extra test to do_wait_for() in the timeout
case and return 1 if x->done is actually set.

A quick audit (not exhaustive) of wait_for_completion_timeout() callers
seems to indicate that no one actually cares about the return value in
the success case -- they just test for 0 (timed out) versus non-zero
(wait succeeded).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
  • Loading branch information
Roland Dreier authored and Ingo Molnar committed Jun 20, 2008
1 parent 8a8cde1 commit bb10ed0
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions kernel/sched.c
Original file line number Diff line number Diff line change
Expand Up @@ -4405,6 +4405,16 @@ do_wait_for_common(struct completion *x, long timeout, int state)
spin_unlock_irq(&x->wait.lock);
timeout = schedule_timeout(timeout);
spin_lock_irq(&x->wait.lock);

/*
* If the completion has arrived meanwhile
* then return 1 jiffy time left:
*/
if (x->done && !timeout) {
timeout = 1;
break;
}

if (!timeout) {
__remove_wait_queue(&x->wait, &wait);
return timeout;
Expand Down

0 comments on commit bb10ed0

Please sign in to comment.