Skip to content

Commit

Permalink
sched/numa: Classify the NUMA topology of a system
Browse files Browse the repository at this point in the history
Smaller NUMA systems tend to have all NUMA nodes directly connected
to each other. This includes the degenerate case of a system with just
one node, ie. a non-NUMA system.

Larger systems can have two kinds of NUMA topology, which affects how
tasks and memory should be placed on the system.

On glueless mesh systems, nodes that are not directly connected to
each other will bounce traffic through intermediary nodes. Task groups
can be run closer to each other by moving tasks from a node to an
intermediary node between it and the task's preferred node.

On NUMA systems with backplane controllers, the intermediary hops
are incapable of running programs. This creates "islands" of nodes
that are at an equal distance to anywhere else in the system.

Each kind of topology requires a slightly different placement
algorithm; this patch provides the mechanism to detect the kind
of NUMA topology of a system.

Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
[ Changed to use kernel/sched/sched.h ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: mgorman@suse.de
Cc: chegu_vinod@hp.com
Link: http://lkml.kernel.org/r/1413530994-9732-3-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
  • Loading branch information
Rik van Riel authored and Ingo Molnar committed Oct 28, 2014
1 parent 9942f79 commit e3fe70b
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 0 deletions.
53 changes: 53 additions & 0 deletions kernel/sched/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -6128,6 +6128,7 @@ static void claim_allocations(int cpu, struct sched_domain *sd)

#ifdef CONFIG_NUMA
static int sched_domains_numa_levels;
enum numa_topology_type sched_numa_topology_type;
static int *sched_domains_numa_distance;
int sched_max_numa_distance;
static struct cpumask ***sched_domains_numa_masks;
Expand Down Expand Up @@ -6316,6 +6317,56 @@ bool find_numa_distance(int distance)
return false;
}

/*
* A system can have three types of NUMA topology:
* NUMA_DIRECT: all nodes are directly connected, or not a NUMA system
* NUMA_GLUELESS_MESH: some nodes reachable through intermediary nodes
* NUMA_BACKPLANE: nodes can reach other nodes through a backplane
*
* The difference between a glueless mesh topology and a backplane
* topology lies in whether communication between not directly
* connected nodes goes through intermediary nodes (where programs
* could run), or through backplane controllers. This affects
* placement of programs.
*
* The type of topology can be discerned with the following tests:
* - If the maximum distance between any nodes is 1 hop, the system
* is directly connected.
* - If for two nodes A and B, located N > 1 hops away from each other,
* there is an intermediary node C, which is < N hops away from both
* nodes A and B, the system is a glueless mesh.
*/
static void init_numa_topology_type(void)
{
int a, b, c, n;

n = sched_max_numa_distance;

if (n <= 1)
sched_numa_topology_type = NUMA_DIRECT;

for_each_online_node(a) {
for_each_online_node(b) {
/* Find two nodes furthest removed from each other. */
if (node_distance(a, b) < n)
continue;

/* Is there an intermediary node between a and b? */
for_each_online_node(c) {
if (node_distance(a, c) < n &&
node_distance(b, c) < n) {
sched_numa_topology_type =
NUMA_GLUELESS_MESH;
return;
}
}

sched_numa_topology_type = NUMA_BACKPLANE;
return;
}
}
}

static void sched_init_numa(void)
{
int next_distance, curr_distance = node_distance(0, 0);
Expand Down Expand Up @@ -6449,6 +6500,8 @@ static void sched_init_numa(void)

sched_domains_numa_levels = level;
sched_max_numa_distance = sched_domains_numa_distance[level - 1];

init_numa_topology_type();
}

static void sched_domains_numa_masks_set(int cpu)
Expand Down
6 changes: 6 additions & 0 deletions kernel/sched/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -679,6 +679,12 @@ static inline u64 rq_clock_task(struct rq *rq)
}

#ifdef CONFIG_NUMA
enum numa_topology_type {
NUMA_DIRECT,
NUMA_GLUELESS_MESH,
NUMA_BACKPLANE,
};
extern enum numa_topology_type sched_numa_topology_type;
extern int sched_max_numa_distance;
extern bool find_numa_distance(int distance);
#endif
Expand Down

0 comments on commit e3fe70b

Please sign in to comment.