-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sched/numa-balancing: Move some document to make it consistent with t…
…he code After commit 8a99b68 ("sched: Move SCHED_DEBUG sysctl to debugfs"), some NUMA balancing sysctls enclosed with SCHED_DEBUG has been moved to debugfs. This patch move the document for these sysctls from Documentation/admin-guide/sysctl/kernel.rst to Documentation/scheduler/sched-debug.rst to make the document consistent with the code. Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lkml.kernel.org/r/20220210052514.3038279-1-ying.huang@intel.com
- Loading branch information
Huang Ying
authored and
Peter Zijlstra
committed
Feb 11, 2022
1 parent
e496132
commit 3624ba7
Showing
3 changed files
with
56 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ Linux Scheduler | |
sched-nice-design | ||
sched-rt-group | ||
sched-stats | ||
sched-debug | ||
|
||
text_files | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
================= | ||
Scheduler debugfs | ||
================= | ||
|
||
Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to | ||
scheduler specific debug files under /sys/kernel/debug/sched. Some of | ||
those files are described below. | ||
|
||
numa_balancing | ||
============== | ||
|
||
`numa_balancing` directory is used to hold files to control NUMA | ||
balancing feature. If the system overhead from the feature is too | ||
high then the rate the kernel samples for NUMA hinting faults may be | ||
controlled by the `scan_period_min_ms, scan_delay_ms, | ||
scan_period_max_ms, scan_size_mb` files. | ||
|
||
|
||
scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb | ||
------------------------------------------------------------------- | ||
|
||
Automatic NUMA balancing scans tasks address space and unmaps pages to | ||
detect if pages are properly placed or if the data should be migrated to a | ||
memory node local to where the task is running. Every "scan delay" the task | ||
scans the next "scan size" number of pages in its address space. When the | ||
end of the address space is reached the scanner restarts from the beginning. | ||
|
||
In combination, the "scan delay" and "scan size" determine the scan rate. | ||
When "scan delay" decreases, the scan rate increases. The scan delay and | ||
hence the scan rate of every task is adaptive and depends on historical | ||
behaviour. If pages are properly placed then the scan delay increases, | ||
otherwise the scan delay decreases. The "scan size" is not adaptive but | ||
the higher the "scan size", the higher the scan rate. | ||
|
||
Higher scan rates incur higher system overhead as page faults must be | ||
trapped and potentially data must be migrated. However, the higher the scan | ||
rate, the more quickly a tasks memory is migrated to a local node if the | ||
workload pattern changes and minimises performance impact due to remote | ||
memory accesses. These files control the thresholds for scan delays and | ||
the number of pages scanned. | ||
|
||
``scan_period_min_ms`` is the minimum time in milliseconds to scan a | ||
tasks virtual memory. It effectively controls the maximum scanning | ||
rate for each task. | ||
|
||
``scan_delay_ms`` is the starting "scan delay" used for a task when it | ||
initially forks. | ||
|
||
``scan_period_max_ms`` is the maximum time in milliseconds to scan a | ||
tasks virtual memory. It effectively controls the minimum scanning | ||
rate for each task. | ||
|
||
``scan_size_mb`` is how many megabytes worth of pages are scanned for | ||
a given scan. |