Skip to content

Commit

Permalink
sched/numa: Examine a task move when examining a task swap
Browse files Browse the repository at this point in the history
Running "perf bench numa mem -0 -m -P 1000 -p 8 -t 20" on a 4
node system results in 160 runnable threads on a system with 80
CPU threads.

Once a process has nearly converged, with 39 threads on one node
and 1 thread on another node, the remaining thread will be unable
to migrate to its preferred node through a task swap.

However, a simple task move would make the workload converge,
witout causing an imbalance.

Test for this unlikely occurrence, and attempt a task move to
the preferred nid when it happens.

 # Running main, "perf bench numa mem -p 8 -t 20 -0 -m -P 1000"

 ###
 # 160 tasks will execute (on 4 nodes, 80 CPUs):
 #         -1x     0MB global  shared mem operations
 #         -1x  1000MB process shared mem operations
 #         -1x     0MB thread  local  mem operations
 ###

 ###
 #
 #    0.0%  [0.2 mins]  0/0   1/1  36/2   0/0  [36/3 ] l:  0-0   (  0) {0-2}
 #    0.0%  [0.3 mins] 43/3  37/2  39/2  41/3  [ 6/10] l:  0-1   (  1) {1-2}
 #    0.0%  [0.4 mins] 42/3  38/2  40/2  40/2  [ 4/9 ] l:  1-2   (  1) [50.0%] {1-2}
 #    0.0%  [0.6 mins] 41/3  39/2  40/2  40/2  [ 2/9 ] l:  2-4   (  2) [50.0%] {1-2}
 #    0.0%  [0.7 mins] 40/2  40/2  40/2  40/2  [ 0/8 ] l:  3-5   (  2) [40.0%] (  41.8s converged)

Without this patch, this same perf bench numa mem run had to
rely on the scheduler load balancer to first balance out the
load (moving a random task), before a task swap could complete
the NUMA convergence.

The load balancer does not normally take action unless the load

difference exceeds 25%. Convergence times of over half an hour
have been observed without this patch.

With this patch, the NUMA balancing code will simply migrate the
task, if that does not cause an imbalance.

Also skip examining a CPU in detail if the improvement on that CPU
is no more than the best we already have.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: chegu_vinod@hp.com
Cc: mgorman@suse.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ggthh0rnh0yua6o5o3p6cr1o@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
  • Loading branch information
Rik van Riel authored and Ingo Molnar committed Jul 5, 2014
1 parent 1c5d3eb commit 0132c3e
Showing 1 changed file with 21 additions and 2 deletions.
23 changes: 21 additions & 2 deletions kernel/sched/fair.c
Original file line number Diff line number Diff line change
Expand Up @@ -1155,6 +1155,7 @@ static void task_numa_compare(struct task_numa_env *env,
long src_load, dst_load;
long load;
long imp = env->p->numa_group ? groupimp : taskimp;
long moveimp = imp;

rcu_read_lock();
cur = ACCESS_ONCE(dst_rq->curr);
Expand Down Expand Up @@ -1201,7 +1202,7 @@ static void task_numa_compare(struct task_numa_env *env,
}
}

if (imp < env->best_imp)
if (imp <= env->best_imp && moveimp <= env->best_imp)
goto unlock;

if (!cur) {
Expand All @@ -1214,7 +1215,8 @@ static void task_numa_compare(struct task_numa_env *env,
}

/* Balance doesn't matter much if we're running a task per cpu */
if (src_rq->nr_running == 1 && dst_rq->nr_running == 1)
if (imp > env->best_imp && src_rq->nr_running == 1 &&
dst_rq->nr_running == 1)
goto assign;

/*
Expand All @@ -1230,6 +1232,23 @@ static void task_numa_compare(struct task_numa_env *env,
src_load += effective_load(tg, env->src_cpu, -load, -load);
dst_load += effective_load(tg, env->dst_cpu, load, load);

if (moveimp > imp && moveimp > env->best_imp) {
/*
* If the improvement from just moving env->p direction is
* better than swapping tasks around, check if a move is
* possible. Store a slightly smaller score than moveimp,
* so an actually idle CPU will win.
*/
if (!load_too_imbalanced(src_load, dst_load, env)) {
imp = moveimp - 1;
cur = NULL;
goto assign;
}
}

if (imp <= env->best_imp)
goto unlock;

if (cur) {
/* Cur moves in the opposite direction. */
load = cur->se.load.weight;
Expand Down

0 comments on commit 0132c3e

Please sign in to comment.