Skip to content

sched-core-2022-08-01

Load-balancing improvements:
============================

- Improve NUMA balancing on AMD Zen systems for affine workloads.

- Improve the handling of reduced-capacity CPUs in load-balancing.

- Energy Model improvements: fix & refine all the energy fairness metrics (PELT),
  and remove the conservative threshold requiring 6% energy savings to
  migrate a task. Doing this improves power efficiency for most workloads,
  and also increases the reliability of energy-efficiency scheduling.

- Optimize/tweak select_idle_cpu() to spend (much) less time searching
  for an idle CPU on overloaded systems. There's reports of several
  milliseconds spent there on large systems with large workloads ...

  [ Since the search logic changed, there might be behavioral side effects. ]

- Improve NUMA imbalance behavior. On certain systems
  with spare capacity, initial placement of tasks is non-deterministic,
  and such an artificial placement imbalance can persist for a long time,
  hurting (and sometimes helping) performance.

  The fix is to make fork-time task placement consistent with runtime
  NUMA balancing placement.

  Note that some performance regressions were reported against this,
  caused by workloads that are not memory bandwith limited, which benefit
  from the artificial locality of the placement bug(s). Mel Gorman's
  conclusion, with which we concur, was that consistency is better than
  random workload benefits from non-deterministic bugs:

     "Given there is no crystal ball and it's a tradeoff, I think it's
      better to be consistent and use similar logic at both fork time
      and runtime even if it doesn't have universal benefit."

- Improve core scheduling by fixing a bug in sched_core_update_cookie() that
  caused unnecessary forced idling.

- Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs for newly
  woken tasks.

- Fix a newidle balancing bug that introduced unnecessary wakeup latencies.

ABI improvements/fixes:
=======================

- Do not check capabilities and do not issue capability check denial messages
  when a scheduler syscall doesn't require privileges. (Such as increasing niceness.)

- Add forced-idle accounting to cgroups too.

- Fix/improve the RSEQ ABI to not just silently accept unknown flags.
  (No existing tooling is known to have learned to rely on the previous behavior.)

- Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags.

Optimizations:
==============

- Optimize & simplify leaf_cfs_rq_list()

- Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg().

Misc fixes & cleanups:
======================

- Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems.

- Fix a full-NOHZ bug that can in some cases result in the tick not being
  re-enabled when the last SCHED_RT task is gone from a runqueue but there's
  still SCHED_OTHER tasks around.

- Various PREEMPT_RT related fixes.

- Misc cleanups & smaller fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Assets 2
Loading