Skip to content

Commit

Permalink
sched/numa: Count pages on active node as local
Browse files Browse the repository at this point in the history
The NUMA code is smart enough to distribute the memory of workloads
that span multiple NUMA nodes across those NUMA nodes.

However, it still has a pretty high scan rate for such workloads,
because any memory that is left on a node other than the node of
the CPU that faulted on the memory is counted as non-local, which
causes the scan rate to go up.

Counting the memory on any node where the task's numa group is
actively running as local, allows the scan rate to slow down
once the application is settled in.

This should reduce the overhead of the automatic NUMA placement
code, when a workload spans multiple NUMA nodes.

Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Vinod Chegu <chegu_vinod@hp.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1397235629-16328-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
  • Loading branch information
Rik van Riel authored and Ingo Molnar committed May 7, 2014
1 parent 2fe5de9 commit 792568e
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion kernel/sched/fair.c
Original file line number Diff line number Diff line change
Expand Up @@ -1738,6 +1738,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
struct task_struct *p = current;
bool migrated = flags & TNF_MIGRATED;
int cpu_node = task_node(current);
int local = !!(flags & TNF_FAULT_LOCAL);
int priv;

if (!numabalancing_enabled)
Expand Down Expand Up @@ -1786,6 +1787,17 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
task_numa_group(p, last_cpupid, flags, &priv);
}

/*
* If a workload spans multiple NUMA nodes, a shared fault that
* occurs wholly within the set of nodes that the workload is
* actively using should be counted as local. This allows the
* scan rate to slow down when a workload has settled down.
*/
if (!priv && !local && p->numa_group &&
node_isset(cpu_node, p->numa_group->active_nodes) &&
node_isset(mem_node, p->numa_group->active_nodes))
local = 1;

task_numa_placement(p);

/*
Expand All @@ -1800,7 +1812,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)

p->numa_faults_buffer_memory[task_faults_idx(mem_node, priv)] += pages;
p->numa_faults_buffer_cpu[task_faults_idx(cpu_node, priv)] += pages;
p->numa_faults_locality[!!(flags & TNF_FAULT_LOCAL)] += pages;
p->numa_faults_locality[local] += pages;
}

static void reset_ptenuma_scan(struct task_struct *p)
Expand Down

0 comments on commit 792568e

Please sign in to comment.