Skip to content

Commit

Permalink
Merge branch 'mm-everything' of git://git.kernel.org/pub/scm/linux/ke…
Browse files Browse the repository at this point in the history
…rnel/git/akpm/mm
  • Loading branch information
Stephen Rothwell committed Feb 3, 2023
2 parents b3cd2bb + 4f206e6 commit 7ca348e
Show file tree
Hide file tree
Showing 416 changed files with 9,302 additions and 5,442 deletions.
39 changes: 39 additions & 0 deletions Documentation/ABI/stable/sysfs-devices-node
Original file line number Diff line number Diff line change
Expand Up @@ -182,3 +182,42 @@ Date: November 2021
Contact: Jarkko Sakkinen <jarkko@kernel.org>
Description:
The total amount of SGX physical memory in bytes.

What: /sys/devices/system/node/nodeX/memory_failure/total
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
The total number of raw poisoned pages (pages containing
corrupted data due to memory errors) on a NUMA node.

What: /sys/devices/system/node/nodeX/memory_failure/ignored
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
ignored by memory error recovery attempt, usually because
support for this type of pages is unavailable, and kernel
gives up the recovery.

What: /sys/devices/system/node/nodeX/memory_failure/failed
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
failed by memory error recovery attempt. This usually means
a key recovery operation failed.

What: /sys/devices/system/node/nodeX/memory_failure/delayed
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
delayed by memory error recovery attempt. Delayed poisoned
pages usually will be retried by kernel.

What: /sys/devices/system/node/nodeX/memory_failure/recovered
Date: January 2023
Contact: Jiaqi Yan <jiaqiyan@google.com>
Description:
Of the raw poisoned pages on a NUMA node, how many pages are
recovered by memory error recovery attempt.
41 changes: 28 additions & 13 deletions Documentation/admin-guide/mm/damon/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,14 +279,25 @@ The ``action`` file is for setting and getting what action you want to apply to
memory regions having specific access pattern of the interest. The keywords
that can be written to and read from the file and their meaning are as below.

- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``
- ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
Note that support of each action depends on the running DAMON operations set
`implementation <sysfs_contexts>`.

- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
- ``lru_prio``: Prioritize the region on its LRU lists.
Supported by ``paddr`` operations set.
- ``lru_deprio``: Deprioritize the region on its LRU lists.
- ``stat``: Do nothing but count the statistics
Supported by ``paddr`` operations set.
- ``stat``: Do nothing but count the statistics.
Supported by all operations sets.

schemes/<N>/access_pattern/
---------------------------
Expand Down Expand Up @@ -388,8 +399,8 @@ pages of all memory cgroups except ``/having_care_already``.::
echo /having_care_already > 1/memcg_path
echo N > 1/matching

Note that filters could be ignored depend on the running DAMON operations set
`implementation <sysfs_contexts>`.
Note that filters are currently supported only when ``paddr``
`implementation <sysfs_contexts>` is being used.

.. _sysfs_schemes_stats:

Expand Down Expand Up @@ -618,11 +629,15 @@ The ``<action>`` is a predefined integer for memory management actions, which
DAMON will apply to the regions having the target access pattern. The
supported numbers and their meanings are as below.

- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``
- 1: Call ``madvise()`` for the region with ``MADV_COLD``
- 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
- 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if
``target`` is ``paddr``.
- 1: Call ``madvise()`` for the region with ``MADV_COLD``. Ignored if
``target`` is ``paddr``.
- 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
- 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. Ignored if
``target`` is ``paddr``.
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. Ignored if
``target`` is ``paddr``.
- 5: Do nothing but count the statistics

Quota
Expand Down
7 changes: 7 additions & 0 deletions Documentation/admin-guide/mm/ksm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,13 @@ stable_node_chains
the number of KSM pages that hit the ``max_page_sharing`` limit
stable_node_dups
number of duplicated KSM pages
zero_pages_sharing
how many empty pages are sharing kernel zero page(s) instead of
with each other as it would happen normally. Only effective when
enabling ``use_zero_pages`` knob.

When enabling ``use_zero_pages``, the sum of ``pages_sharing`` +
``zero_pages_sharing`` represents how much really saved by KSM.

A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
Expand Down
25 changes: 22 additions & 3 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,16 +453,35 @@ this allows system administrators to override the
kexec_load_disabled
===================

A toggle indicating if the ``kexec_load`` syscall has been disabled.
This value defaults to 0 (false: ``kexec_load`` enabled), but can be
set to 1 (true: ``kexec_load`` disabled).
A toggle indicating if the syscalls ``kexec_load`` and
``kexec_file_load`` have been disabled.
This value defaults to 0 (false: ``kexec_*load`` enabled), but can be
set to 1 (true: ``kexec_*load`` disabled).
Once true, kexec can no longer be used, and the toggle cannot be set
back to false.
This allows a kexec image to be loaded before disabling the syscall,
allowing a system to set up (and later use) an image without it being
altered.
Generally used together with the `modules_disabled`_ sysctl.

kexec_load_limit_panic
======================

This parameter specifies a limit to the number of times the syscalls
``kexec_load`` and ``kexec_file_load`` can be called with a crash
image. It can only be set with a more restrictive value than the
current one.

== ======================================================
-1 Unlimited calls to kexec. This is the default setting.
N Number of calls left.
== ======================================================

kexec_load_limit_reboot
=======================

Similar functionality as ``kexec_load_limit_panic``, but for a normal
image.

kptr_restrict
=============
Expand Down
29 changes: 14 additions & 15 deletions Documentation/core-api/pin_user_pages.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,17 @@ flags the caller provides. The caller is required to pass in a non-null struct
pages* array, and the function then pins pages by incrementing each by a special
value: GUP_PIN_COUNTING_BIAS.

For compound pages, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead,
an exact form of pin counting is achieved, by using the 2nd struct page
in the compound page. A new struct page field, compound_pincount, has
been added in order to support this.

This approach for compound pages avoids the counting upper limit problems that
are discussed below. Those limitations would have been aggravated severely by
huge pages, because each tail page adds a refcount to the head page. And in
fact, testing revealed that, without a separate compound_pincount field,
page overflows were seen in some huge page stress tests.

This also means that huge pages and compound pages do not suffer
For large folios, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead,
the extra space available in the struct folio is used to store the
pincount directly.

This approach for large folios avoids the counting upper limit problems
that are discussed below. Those limitations would have been aggravated
severely by huge pages, because each tail page adds a refcount to the
head page. And in fact, testing revealed that, without a separate pincount
field, refcount overflows were seen in some huge page stress tests.

This also means that huge pages and large folios do not suffer
from the false positives problem that is mentioned below.::

Function
Expand Down Expand Up @@ -264,9 +263,9 @@ place.)
Other diagnostics
=================

dump_page() has been enhanced slightly, to handle these new counting
fields, and to better report on compound pages in general. Specifically,
for compound pages, the exact (compound_pincount) pincount is reported.
dump_page() has been enhanced slightly to handle these new counting
fields, and to better report on large folios in general. Specifically,
for large folios, the exact pincount is reported.

References
==========
Expand Down
65 changes: 65 additions & 0 deletions Documentation/fault-injection/fault-injection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,71 @@ proc entries
This feature is intended for systematic testing of faults in a single
system call. See an example below.


Error Injectable Functions
--------------------------

This part is for the kenrel developers considering to add a function to
ALLOW_ERROR_INJECTION() macro.

Requirements for the Error Injectable Functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Since the function-level error injection forcibly changes the code path
and returns an error even if the input and conditions are proper, this can
cause unexpected kernel crash if you allow error injection on the function
which is NOT error injectable. Thus, you (and reviewers) must ensure;

- The function returns an error code if it fails, and the callers must check
it correctly (need to recover from it).

- The function does not execute any code which can change any state before
the first error return. The state includes global or local, or input
variable. For example, clear output address storage (e.g. `*ret = NULL`),
increments/decrements counter, set a flag, preempt/irq disable or get
a lock (if those are recovered before returning error, that will be OK.)

The first requirement is important, and it will result in that the release
(free objects) functions are usually harder to inject errors than allocate
functions. If errors of such release functions are not correctly handled
it will cause a memory leak easily (the caller will confuse that the object
has been released or corrupted.)

The second one is for the caller which expects the function should always
does something. Thus if the function error injection skips whole of the
function, the expectation is betrayed and causes an unexpected error.

Type of the Error Injectable Functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Each error injectable functions will have the error type specified by the
ALLOW_ERROR_INJECTION() macro. You have to choose it carefully if you add
a new error injectable function. If the wrong error type is chosen, the
kernel may crash because it may not be able to handle the error.
There are 4 types of errors defined in include/asm-generic/error-injection.h

EI_ETYPE_NULL
This function will return `NULL` if it fails. e.g. return an allocateed
object address.

EI_ETYPE_ERRNO
This function will return an `-errno` error code if it fails. e.g. return
-EINVAL if the input is wrong. This will include the functions which will
return an address which encodes `-errno` by ERR_PTR() macro.

EI_ETYPE_ERRNO_NULL
This function will return an `-errno` or `NULL` if it fails. If the caller
of this function checks the return value with IS_ERR_OR_NULL() macro, this
type will be appropriate.

EI_ETYPE_TRUE
This function will return `true` (non-zero positive value) if it fails.

If you specifies a wrong type, for example, EI_TYPE_ERRNO for the function
which returns an allocated object, it may cause a problem because the returned
value is not an object address and the caller can not access to the address.


How to add new fault injection capability
-----------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion Documentation/mm/balance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Memory Balancing

Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com>

Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as
Memory balancing is needed for !__GFP_HIGH and !__GFP_KSWAPD_RECLAIM as
well as for non __GFP_IO allocations.

The first reason why a caller may avoid reclaim is that the caller can not
Expand Down
22 changes: 14 additions & 8 deletions Documentation/mm/damon/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@
DAMON: Data Access MONitor
==========================

DAMON is a data access monitoring framework subsystem for the Linux kernel.
The core mechanisms of DAMON (refer to :doc:`design` for the detail) make it
DAMON is a Linux kernel subsystem that provides a framework for data access
monitoring and the monitoring results based system operations. The core
monitoring mechanisms of DAMON (refer to :doc:`design` for the detail) make it

- *accurate* (the monitoring output is useful enough for DRAM level memory
management; It might not appropriate for CPU Cache levels, though),
Expand All @@ -14,16 +15,21 @@ The core mechanisms of DAMON (refer to :doc:`design` for the detail) make it
- *scalable* (the upper-bound of the overhead is in constant range regardless
of the size of target workloads).

Using this framework, therefore, the kernel's memory management mechanisms can
make advanced decisions. Experimental memory management optimization works
that incurring high data accesses monitoring overhead could implemented again.
In user space, meanwhile, users who have some special workloads can write
personalized applications for better understanding and optimizations of their
workloads and systems.
Using this framework, therefore, the kernel can operate system in an
access-aware fashion. Because the features are also exposed to the user space,
users who have special information about their workloads can write personalized
applications for better understanding and optimizations of their workloads and
systems.

For easier development of such systems, DAMON provides a feature called DAMOS
(DAMon-based Operation Schemes) in addition to the monitoring. Using the
feature, DAMON users in both kernel and user spaces can do access-aware system
operations with no code but simple configurations.

.. toctree::
:maxdepth: 2

faq
design
api
maintainer-profile
62 changes: 62 additions & 0 deletions Documentation/mm/damon/maintainer-profile.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
.. SPDX-License-Identifier: GPL-2.0
DAMON Maintainer Entry Profile
==============================

The DAMON subsystem covers the files that listed in 'DATA ACCESS MONITOR'
section of 'MAINTAINERS' file.

The mailing lists for the subsystem are damon@lists.linux.dev and
linux-mm@kvack.org. Patches should be made against the mm-unstable tree [1]_
whenever possible and posted to the mailing lists.

SCM Trees
---------

There are multiple Linux trees for DAMON development. Patches under
development or testing are queued in damon/next [2]_ by the DAMON maintainer.
Suffieicntly reviewed patches will be queued in mm-unstable [1]_ by the memory
management subsystem maintainer. After more sufficient tests, the patches will
be queued in mm-stable [3]_ , and finally pull-requested to the mainline by the
memory management subsystem maintainer.

Note again the patches for review should be made against the mm-unstable
tree[1] whenever possible. damon/next is only for preview of others' works in
progress.

Submit checklist addendum
-------------------------

When making DAMON changes, you should do below.

- Build changes related outputs including kernel and documents.
- Ensure the builds introduce no new errors or warnings.
- Run and ensure no new failures for DAMON selftests [4]_ and kunittests [5]_ .

Further doing below and putting the results will be helpful.

- Run damon-tests/corr [6]_ for normal changes.
- Run damon-tests/perf [7]_ for performance changes.

Key cycle dates
---------------

Patches can be sent anytime. Key cycle dates of the mm-unstable[1] and
mm-stable[3] trees depend on the memory management subsystem maintainer.

Review cadence
--------------

The DAMON maintainer does the work on the usual work hour (09:00 to 17:00,
Mon-Fri) in PST. The response to patches will occasionally be slow. Do not
hesitate to send a ping if you have not heard back within a week of sending a
patch.


.. [1] https://git.kernel.org/akpm/mm/h/mm-unstable
.. [2] https://git.kernel.org/sj/h/damon/next
.. [3] https://git.kernel.org/akpm/mm/h/mm-stable
.. [4] https://github.com/awslabs/damon-tests/blob/master/corr/run.sh#L49
.. [5] https://github.com/awslabs/damon-tests/blob/master/corr/tests/kunit.sh
.. [6] https://github.com/awslabs/damon-tests/tree/master/corr
.. [7] https://github.com/awslabs/damon-tests/tree/master/perf
Loading

0 comments on commit 7ca348e

Please sign in to comment.