Skip to content

Commit

Permalink
sched: Fix nohz load accounting -- again!
Browse files Browse the repository at this point in the history
Various people reported nohz load tracking still being wrecked, but Doug
spotted the actual problem. We fold the nohz remainder in too soon,
causing us to loose samples and under-account.

So instead of playing catch-up up-front, always do a single load-fold
with whatever state we encounter and only then fold the nohz remainder
and play catch-up.

Reported-by: Doug Smythies <dsmythies@telus.net>
Reported-by: LesÅ=82aw Kope=C4=87 <leslaw.kopec@nasza-klasa.pl>
Reported-by: Aman Gupta <aman@tmm1.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-4v31etnhgg9kwd6ocgx3rxl8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
  • Loading branch information
Peter Zijlstra authored and Ingo Molnar committed Mar 12, 2012
1 parent 8e3fabf commit c308b56
Showing 1 changed file with 26 additions and 27 deletions.
53 changes: 26 additions & 27 deletions kernel/sched/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -2266,13 +2266,10 @@ calc_load_n(unsigned long load, unsigned long exp,
* Once we've updated the global active value, we need to apply the exponential
* weights adjusted to the number of cycles missed.
*/
static void calc_global_nohz(unsigned long ticks)
static void calc_global_nohz(void)
{
long delta, active, n;

if (time_before(jiffies, calc_load_update))
return;

/*
* If we crossed a calc_load_update boundary, make sure to fold
* any pending idle changes, the respective CPUs might have
Expand All @@ -2284,31 +2281,25 @@ static void calc_global_nohz(unsigned long ticks)
atomic_long_add(delta, &calc_load_tasks);

/*
* If we were idle for multiple load cycles, apply them.
* It could be the one fold was all it took, we done!
*/
if (ticks >= LOAD_FREQ) {
n = ticks / LOAD_FREQ;
if (time_before(jiffies, calc_load_update + 10))
return;

active = atomic_long_read(&calc_load_tasks);
active = active > 0 ? active * FIXED_1 : 0;
/*
* Catch-up, fold however many we are behind still
*/
delta = jiffies - calc_load_update - 10;
n = 1 + (delta / LOAD_FREQ);

avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
active = atomic_long_read(&calc_load_tasks);
active = active > 0 ? active * FIXED_1 : 0;

calc_load_update += n * LOAD_FREQ;
}
avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);

/*
* Its possible the remainder of the above division also crosses
* a LOAD_FREQ period, the regular check in calc_global_load()
* which comes after this will take care of that.
*
* Consider us being 11 ticks before a cycle completion, and us
* sleeping for 4*LOAD_FREQ + 22 ticks, then the above code will
* age us 4 cycles, and the test in calc_global_load() will
* pick up the final one.
*/
calc_load_update += n * LOAD_FREQ;
}
#else
void calc_load_account_idle(struct rq *this_rq)
Expand All @@ -2320,7 +2311,7 @@ static inline long calc_load_fold_idle(void)
return 0;
}

static void calc_global_nohz(unsigned long ticks)
static void calc_global_nohz(void)
{
}
#endif
Expand Down Expand Up @@ -2348,8 +2339,6 @@ void calc_global_load(unsigned long ticks)
{
long active;

calc_global_nohz(ticks);

if (time_before(jiffies, calc_load_update + 10))
return;

Expand All @@ -2361,6 +2350,16 @@ void calc_global_load(unsigned long ticks)
avenrun[2] = calc_load(avenrun[2], EXP_15, active);

calc_load_update += LOAD_FREQ;

/*
* Account one period with whatever state we found before
* folding in the nohz state and ageing the entire idle period.
*
* This avoids loosing a sample when we go idle between
* calc_load_account_active() (10 ticks ago) and now and thus
* under-accounting.
*/
calc_global_nohz();
}

/*
Expand Down

0 comments on commit c308b56

Please sign in to comment.