Skip to content

Commit

Permalink
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/l…
Browse files Browse the repository at this point in the history
…inux/kernel/git/tip/tip

Pull scheduler changes from Ingo Molnar:
 "The biggest change is the cleanup/simplification of the load-balancer:
  instead of the current practice of architectures twiddling scheduler
  internal data structures and providing the scheduler domains in
  colorfully inconsistent ways, we now have generic scheduler code in
  kernel/sched/core.c:sched_init_numa() that looks at the architecture's
  node_distance() parameters and (while not fully trusting it) deducts a
  NUMA topology from it.

  This inevitably changes balancing behavior - hopefully for the better.

  There are various smaller optimizations, cleanups and fixlets as well"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Taint kernel with TAINT_WARN after sleep-in-atomic bug
  sched: Remove stale power aware scheduling remnants and dysfunctional knobs
  sched/debug: Fix printing large integers on 32-bit platforms
  sched/fair: Improve the ->group_imb logic
  sched/nohz: Fix rq->cpu_load[] calculations
  sched/numa: Don't scale the imbalance
  sched/fair: Revert sched-domain iteration breakage
  sched/x86: Rewrite set_cpu_sibling_map()
  sched/numa: Fix the new NUMA topology bits
  sched/numa: Rewrite the CONFIG_NUMA sched domain support
  sched/fair: Propagate 'struct lb_env' usage into find_busiest_group
  sched/fair: Add some serialization to the sched_domain load-balance walk
  sched/fair: Let minimally loaded cpu balance the group
  sched: Change rq->nr_running to unsigned int
  x86/numa: Check for nonsensical topologies on real hw as well
  x86/numa: Hard partition cpu topology masks on node boundaries
  x86/numa: Allow specifying node_distance() for numa=fake
  x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly
  sched: Update documentation and comments
  sched_rt: Avoid unnecessary dequeue and enqueue of pushable tasks in set_cpus_allowed_rt()
  • Loading branch information
Linus Torvalds committed May 23, 2012
2 parents 2ff2b28 + 1c2927f commit d79ee93
Show file tree
Hide file tree
Showing 25 changed files with 441 additions and 1,005 deletions.
25 changes: 0 additions & 25 deletions Documentation/ABI/testing/sysfs-devices-system-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -9,31 +9,6 @@ Description:

/sys/devices/system/cpu/cpu#/

What: /sys/devices/system/cpu/sched_mc_power_savings
/sys/devices/system/cpu/sched_smt_power_savings
Date: June 2006
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Discover and adjust the kernel's multi-core scheduler support.

Possible values are:

0 - No power saving load balance (default value)
1 - Fill one thread/core/package first for long running threads
2 - Also bias task wakeups to semi-idle cpu package for power
savings

sched_mc_power_savings is dependent upon SCHED_MC, which is
itself architecture dependent.

sched_smt_power_savings is dependent upon SCHED_SMT, which
is itself architecture dependent.

The two files are independent of each other. It is possible
that one file may be present without the other.

Introduced by git commit 5c45bf27.


What: /sys/devices/system/cpu/kernel_max
/sys/devices/system/cpu/offline
/sys/devices/system/cpu/online
Expand Down
6 changes: 3 additions & 3 deletions Documentation/scheduler/sched-design-CFS.txt
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ CFS implements three scheduling policies:
idle timer scheduler in order to avoid to get into priority
inversion problems which would deadlock the machine.

SCHED_FIFO/_RR are implemented in sched_rt.c and are as specified by
SCHED_FIFO/_RR are implemented in sched/rt.c and are as specified by
POSIX.

The command chrt from util-linux-ng 2.13.1.1 can set all of these except
Expand All @@ -145,9 +145,9 @@ Classes," an extensible hierarchy of scheduler modules. These modules
encapsulate scheduling policy details and are handled by the scheduler core
without the core code assuming too much about them.

sched_fair.c implements the CFS scheduler described above.
sched/fair.c implements the CFS scheduler described above.

sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than
sched/rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than
the previous vanilla scheduler did. It uses 100 runqueues (for all 100 RT
priority levels, instead of 140 in the previous scheduler) and it needs no
expired array.
Expand Down
4 changes: 0 additions & 4 deletions Documentation/scheduler/sched-domains.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,6 @@ The implementor should read comments in include/linux/sched.h:
struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of
the specifics and what to tune.

For SMT, the architecture must define CONFIG_SCHED_SMT and provide a
cpumask_t cpu_sibling_map[NR_CPUS], where cpu_sibling_map[i] is the mask of
all "i"'s siblings as well as "i" itself.

Architectures may retain the regular override the default SD_*_INIT flags
while using the generic domain builder in kernel/sched.c if they wish to
retain the traditional SMT->SMP->NUMA topology (or some subset of that). This
Expand Down
25 changes: 0 additions & 25 deletions arch/ia64/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,31 +70,6 @@ void build_cpu_to_node_map(void);
.nr_balance_failed = 0, \
}

/* sched_domains SD_NODE_INIT for IA64 NUMA machines */
#define SD_NODE_INIT (struct sched_domain) { \
.parent = NULL, \
.child = NULL, \
.groups = NULL, \
.min_interval = 8, \
.max_interval = 8*(min(num_online_cpus(), 32U)), \
.busy_factor = 64, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
.busy_idx = 3, \
.idle_idx = 2, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_NEWIDLE \
| SD_BALANCE_EXEC \
| SD_BALANCE_FORK \
| SD_SERIALIZE, \
.last_balance = jiffies, \
.balance_interval = 64, \
.nr_balance_failed = 0, \
}

#endif /* CONFIG_NUMA */

#ifdef CONFIG_SMP
Expand Down
17 changes: 0 additions & 17 deletions arch/mips/include/asm/mach-ip27/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,23 +36,6 @@ extern unsigned char __node_distances[MAX_COMPACT_NODES][MAX_COMPACT_NODES];

#define node_distance(from, to) (__node_distances[(from)][(to)])

/* sched_domains SD_NODE_INIT for SGI IP27 machines */
#define SD_NODE_INIT (struct sched_domain) { \
.parent = NULL, \
.child = NULL, \
.groups = NULL, \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
.flags = SD_LOAD_BALANCE | \
SD_BALANCE_EXEC, \
.last_balance = jiffies, \
.balance_interval = 1, \
.nr_balance_failed = 0, \
}

#include <asm-generic/topology.h>

#endif /* _ASM_MACH_TOPOLOGY_H */
36 changes: 0 additions & 36 deletions arch/powerpc/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,6 @@ struct device_node;
*/
#define RECLAIM_DISTANCE 10

/*
* Avoid creating an extra level of balancing (SD_ALLNODES) on the largest
* POWER7 boxes which have a maximum of 32 nodes.
*/
#define SD_NODES_PER_DOMAIN 32

#include <asm/mmzone.h>

static inline int cpu_to_node(int cpu)
Expand Down Expand Up @@ -51,36 +45,6 @@ static inline int pcibus_to_node(struct pci_bus *bus)
cpu_all_mask : \
cpumask_of_node(pcibus_to_node(bus)))

/* sched_domains SD_NODE_INIT for PPC64 machines */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
.busy_idx = 3, \
.idle_idx = 1, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
\
.flags = 1*SD_LOAD_BALANCE \
| 0*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
| 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \
| 1*SD_SERIALIZE \
| 0*SD_PREFER_SIBLING \
, \
.last_balance = jiffies, \
.balance_interval = 1, \
}

extern int __node_distance(int, int);
#define node_distance(a, b) __node_distance(a, b)

Expand Down
25 changes: 0 additions & 25 deletions arch/sh/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,6 @@

#ifdef CONFIG_NUMA

/* sched_domains SD_NODE_INIT for sh machines */
#define SD_NODE_INIT (struct sched_domain) { \
.parent = NULL, \
.child = NULL, \
.groups = NULL, \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
.busy_idx = 3, \
.idle_idx = 2, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_FORK \
| SD_BALANCE_EXEC \
| SD_BALANCE_NEWIDLE \
| SD_SERIALIZE, \
.last_balance = jiffies, \
.balance_interval = 1, \
.nr_balance_failed = 0, \
}

#define cpu_to_node(cpu) ((void)(cpu),0)
#define parent_node(node) ((void)(node),0)

Expand Down
19 changes: 0 additions & 19 deletions arch/sparc/include/asm/topology_64.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,25 +31,6 @@ static inline int pcibus_to_node(struct pci_bus *pbus)
cpu_all_mask : \
cpumask_of_node(pcibus_to_node(bus)))

#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 2, \
.busy_idx = 3, \
.idle_idx = 2, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
.flags = SD_LOAD_BALANCE \
| SD_BALANCE_FORK \
| SD_BALANCE_EXEC \
| SD_SERIALIZE, \
.last_balance = jiffies, \
.balance_interval = 1, \
}

#else /* CONFIG_NUMA */

#include <asm-generic/topology.h>
Expand Down
26 changes: 0 additions & 26 deletions arch/tile/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,32 +78,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
.balance_interval = 32, \
}

/* sched_domains SD_NODE_INIT for TILE architecture */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 16, \
.max_interval = 512, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = 1, \
.busy_idx = 3, \
.idle_idx = 1, \
.newidle_idx = 2, \
.wake_idx = 1, \
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 0*SD_WAKE_AFFINE \
| 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_SHARE_PKG_RESOURCES \
| 1*SD_SERIALIZE \
, \
.last_balance = jiffies, \
.balance_interval = 128, \
}

/* By definition, we create nodes based on online memory. */
#define node_has_online_mem(nid) 1

Expand Down
38 changes: 0 additions & 38 deletions arch/x86/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -92,44 +92,6 @@ extern void setup_node_to_cpumask_map(void);

#define pcibus_to_node(bus) __pcibus_to_node(bus)

#ifdef CONFIG_X86_32
# define SD_CACHE_NICE_TRIES 1
# define SD_IDLE_IDX 1
#else
# define SD_CACHE_NICE_TRIES 2
# define SD_IDLE_IDX 2
#endif

/* sched_domains SD_NODE_INIT for NUMA machines */
#define SD_NODE_INIT (struct sched_domain) { \
.min_interval = 8, \
.max_interval = 32, \
.busy_factor = 32, \
.imbalance_pct = 125, \
.cache_nice_tries = SD_CACHE_NICE_TRIES, \
.busy_idx = 3, \
.idle_idx = SD_IDLE_IDX, \
.newidle_idx = 0, \
.wake_idx = 0, \
.forkexec_idx = 0, \
\
.flags = 1*SD_LOAD_BALANCE \
| 1*SD_BALANCE_NEWIDLE \
| 1*SD_BALANCE_EXEC \
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
| 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \
| 1*SD_SERIALIZE \
| 0*SD_PREFER_SIBLING \
, \
.last_balance = jiffies, \
.balance_interval = 1, \
}

extern int __node_distance(int, int);
#define node_distance(a, b) __node_distance(a, b)

Expand Down
8 changes: 8 additions & 0 deletions arch/x86/kernel/process.c
Original file line number Diff line number Diff line change
Expand Up @@ -582,9 +582,17 @@ int mwait_usable(const struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;

/* Use mwait if idle=mwait boot option is given */
if (boot_option_idle_override == IDLE_FORCE_MWAIT)
return 1;

/*
* Any idle= boot option other than idle=mwait means that we must not
* use mwait. Eg: idle=halt or idle=poll or idle=nomwait
*/
if (boot_option_idle_override != IDLE_NO_OVERRIDE)
return 0;

if (c->cpuid_level < MWAIT_INFO)
return 0;

Expand Down
Loading

0 comments on commit d79ee93

Please sign in to comment.