Skip to content

Commit

Permalink
Merge branch 'rework/console-list-lock' into for-linus
Browse files Browse the repository at this point in the history
  • Loading branch information
Petr Mladek committed Dec 8, 2022
2 parents 7365df1 + 5074ffb commit 6b2b0d8
Show file tree
Hide file tree
Showing 2,944 changed files with 142,228 additions and 36,098 deletions.
1 change: 1 addition & 0 deletions .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ ForEachMacros:
- 'for_each_component_dais'
- 'for_each_component_dais_safe'
- 'for_each_console'
- 'for_each_console_srcu'
- 'for_each_cpu'
- 'for_each_cpu_and'
- 'for_each_cpu_not'
Expand Down
1 change: 1 addition & 0 deletions .mailmap
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ Filipe Lautert <filipe@icewall.org>
Finn Thain <fthain@linux-m68k.org> <fthain@telegraphics.com.au>
Franck Bui-Huu <vagabon.xyz@gmail.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@am.sony.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@sony.com>
Frank Rowand <frowand.list@gmail.com> <frank.rowand@sonymobile.com>
Frank Rowand <frowand.list@gmail.com> <frowand@mvista.com>
Frank Zago <fzago@systemfabricworks.com>
Expand Down
33 changes: 33 additions & 0 deletions Documentation/ABI/testing/sysfs-bus-pci
Original file line number Diff line number Diff line change
Expand Up @@ -457,3 +457,36 @@ Description:

The file is writable if the PF is bound to a driver that
implements ->sriov_set_msix_vec_count().

What: /sys/bus/pci/devices/.../resourceN_resize
Date: September 2022
Contact: Alex Williamson <alex.williamson@redhat.com>
Description:
These files provide an interface to PCIe Resizable BAR support.
A file is created for each BAR resource (N) supported by the
PCIe Resizable BAR extended capability of the device. Reading
each file exposes the bitmap of available resource sizes:

# cat resource1_resize
00000000000001c0

The bitmap represents supported resource sizes for the BAR,
where bit0 = 1MB, bit1 = 2MB, bit2 = 4MB, etc. In the above
example the device supports 64MB, 128MB, and 256MB BAR sizes.

When writing the file, the user provides the bit position of
the desired resource size, for example:

# echo 7 > resource1_resize

This indicates to set the size value corresponding to bit 7,
128MB. The resulting size is 2 ^ (bit# + 20). This definition
matches the PCIe specification of this capability.

In order to make use of resource resizing, all PCI drivers must
be unbound from the device and peer devices under the same
parent bridge may need to be soft removed. In the case of
VGA devices, writing a resize value will remove low level
console drivers from the device. Raw users of pci-sysfs
resourceN attributes must be terminated prior to resizing.
Success of the resizing operation is not guaranteed.
8 changes: 8 additions & 0 deletions Documentation/ABI/testing/sysfs-devices-vfio-dev
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
What: /sys/.../<device>/vfio-dev/vfioX/
Date: September 2022
Contact: Yi Liu <yi.l.liu@intel.com>
Description:
This directory is created when the device is bound to a
vfio driver. The layout under this directory matches what
exists for a standard 'struct device'. 'X' is a unique
index marking this device in vfio.
24 changes: 24 additions & 0 deletions Documentation/ABI/testing/sysfs-fs-f2fs
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,30 @@ Description: Show status of f2fs superblock in real time.
0x4000 SBI_IS_FREEZING freefs is in process
====== ===================== =================================

What: /sys/fs/f2fs/<disk>/stat/cp_status
Date: September 2022
Contact: "Chao Yu" <chao.yu@oppo.com>
Description: Show status of f2fs checkpoint in real time.

=============================== ==============================
cp flag value
CP_UMOUNT_FLAG 0x00000001
CP_ORPHAN_PRESENT_FLAG 0x00000002
CP_COMPACT_SUM_FLAG 0x00000004
CP_ERROR_FLAG 0x00000008
CP_FSCK_FLAG 0x00000010
CP_FASTBOOT_FLAG 0x00000020
CP_CRC_RECOVERY_FLAG 0x00000040
CP_NAT_BITS_FLAG 0x00000080
CP_TRIMMED_FLAG 0x00000100
CP_NOCRC_RECOVERY_FLAG 0x00000200
CP_LARGE_NAT_BITMAP_FLAG 0x00000400
CP_QUOTA_NEED_FSCK_FLAG 0x00000800
CP_DISABLED_FLAG 0x00001000
CP_DISABLED_QUICK_FLAG 0x00002000
CP_RESIZEFS_FLAG 0x00004000
=============================== ==============================

What: /sys/fs/f2fs/<disk>/ckpt_thread_ioprio
Date: January 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Expand Down
8 changes: 8 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-livepatch
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,14 @@ Description:
The object directory contains subdirectories for each function
that is patched within the object.

What: /sys/kernel/livepatch/<patch>/<object>/patched
Date: August 2022
KernelVersion: 6.1.0
Contact: live-patching@vger.kernel.org
Description:
An attribute which indicates whether the object is currently
patched.

What: /sys/kernel/livepatch/<patch>/<object>/<function,sympos>
Date: Nov 2014
KernelVersion: 3.19.0
Expand Down
25 changes: 25 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-mm-memory-tiers
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
What: /sys/devices/virtual/memory_tiering/
Date: August 2022
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: A collection of all the memory tiers allocated.

Individual memory tier details are contained in subdirectories
named by the abstract distance of the memory tier.

/sys/devices/virtual/memory_tiering/memory_tierN/


What: /sys/devices/virtual/memory_tiering/memory_tierN/
/sys/devices/virtual/memory_tiering/memory_tierN/nodes
Date: August 2022
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Directory with details of a specific memory tier

This is the directory containing information about a particular
memory tier, memtierN, where N is derived based on abstract distance.

A smaller value of N implies a higher (faster) memory tier in the
hierarchy.

nodes: NUMA nodes that are part of this memory tier.

2 changes: 1 addition & 1 deletion Documentation/accounting/delay-accounting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ a) waiting for a CPU (while being runnable)
b) completion of synchronous block I/O initiated by the task
c) swapping in pages
d) memory reclaim
e) thrashing page cache
e) thrashing
f) direct compact
g) write-protect copy

Expand Down
4 changes: 1 addition & 3 deletions Documentation/admin-guide/cgroup-v1/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
lruvec->lru_lock; PG_lru bit of page->flags is cleared before
isolating a page from its LRU under lruvec->lru_lock.

2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
2.7 Kernel Memory Extension
-----------------------------------------------

With the Kernel memory extension, the Memory Controller is able to limit
Expand Down Expand Up @@ -386,8 +386,6 @@ U != 0, K >= U:

a. Enable CONFIG_CGROUPS
b. Enable CONFIG_MEMCG
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)

3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
-------------------------------------------------------------------
Expand Down
23 changes: 23 additions & 0 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -976,6 +976,29 @@ All cgroup core files are prefixed with "cgroup."
killing cgroups is a process directed operation, i.e. it affects
the whole thread-group.

cgroup.pressure
A read-write single value file that allowed values are "0" and "1".
The default is "1".

Writing "0" to the file will disable the cgroup PSI accounting.
Writing "1" to the file will re-enable the cgroup PSI accounting.

This control attribute is not hierarchical, so disable or enable PSI
accounting in a cgroup does not affect PSI accounting in descendants
and doesn't need pass enablement via ancestors from root.

The reason this control attribute exists is that PSI accounts stalls for
each cgroup separately and aggregates it at each level of the hierarchy.
This may cause non-negligible overhead for some workloads when under
deep level of the hierarchy, in which case this control attribute can
be used to disable PSI accounting in the non-leaf cgroups.

irq.pressure
A read-write nested-keyed file.

Shows pressure stall information for IRQ/SOFTIRQ. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.

Controllers
===========

Expand Down
22 changes: 16 additions & 6 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,8 @@
force_enable - Force enable the IOMMU on platforms known
to be buggy with IOMMU enabled. Use this
option with care.
pgtbl_v1 - Use v1 page table for DMA-API (Default).
pgtbl_v2 - Use v2 page table for DMA-API.

amd_iommu_dump= [HW,X86-64]
Enable AMD IOMMU driver option to dump the ACPI table
Expand Down Expand Up @@ -1467,6 +1469,14 @@
Permit 'security.evm' to be updated regardless of
current integrity status.

early_page_ext [KNL] Enforces page_ext initialization to earlier
stages so cover more early boot allocations.
Please note that as side effect some optimizations
might be disabled to achieve that (e.g. parallelized
memory initialization is disabled) so the boot process
might take longer, especially on systems with a lot of
memory. Available with CONFIG_PAGE_EXTENSION=y.

failslab=
fail_usercopy=
fail_page_alloc=
Expand Down Expand Up @@ -6039,12 +6049,6 @@
This parameter controls use of the Protected
Execution Facility on pSeries.

swapaccount= [KNL]
Format: [0|1]
Enable accounting of swap in memory resource
controller if no parameter or 1 is given or disable
it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)

swiotlb= [ARM,IA-64,PPC,MIPS,X86]
Format: { <int> [,<int>] | force | noforce }
<int> -- Number of I/O TLB slabs
Expand Down Expand Up @@ -6847,6 +6851,12 @@
Crash from Xen panic notifier, without executing late
panic() code such as dumping handler.

xen_msr_safe= [X86,XEN]
Format: <bool>
Select whether to always use non-faulting (safe) MSR
access functions when running as Xen PV guest. The
default value is controlled by CONFIG_XEN_PV_MSR_SAFE.

xen_nopvspin [X86,XEN]
Disables the qspinlock slowpath using Xen PV optimizations.
This parameter is obsoleted by "nopvspin" parameter, which
Expand Down
10 changes: 5 additions & 5 deletions Documentation/admin-guide/mm/cma_debugfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ CMA Debugfs Interface
The CMA debugfs interface is useful to retrieve basic information out of the
different CMA areas and to test allocation/release in each of the areas.

Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
kernel's CMA index. So the first CMA zone would be:
Each CMA area represents a directory under <debugfs>/cma/, represented by
its CMA name like below:

<debugfs>/cma/cma-0
<debugfs>/cma/<cma_name>

The structure of the files created under that directory is as follows:

Expand All @@ -18,8 +18,8 @@ The structure of the files created under that directory is as follows:
- [RO] bitmap: The bitmap of page states in the zone.
- [WO] alloc: Allocate N pages from that CMA area. For example::

echo 5 > <debugfs>/cma/cma-2/alloc
echo 5 > <debugfs>/cma/<cma_name>/alloc

would try to allocate 5 pages from the cma-2 area.
would try to allocate 5 pages from the 'cma_name' area.

- [WO] free: Free N pages from that CMA area, similar to the above.
6 changes: 3 additions & 3 deletions Documentation/admin-guide/mm/damon/index.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
========================
Monitoring Data Accesses
========================
==========================
DAMON: Data Access MONitor
==========================

:doc:`DAMON </mm/damon/index>` allows light-weight data access monitoring.
Using DAMON, users can analyze the memory access patterns of their systems and
Expand Down
13 changes: 3 additions & 10 deletions Documentation/admin-guide/mm/damon/start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,9 @@ called DAMON Operator (DAMO). It is available at
https://github.com/awslabs/damo. The examples below assume that ``damo`` is on
your ``$PATH``. It's not mandatory, though.

Because DAMO is using the debugfs interface (refer to :doc:`usage` for the
detail) of DAMON, you should ensure debugfs is mounted. Mount it manually as
below::

# mount -t debugfs none /sys/kernel/debug/

or append the following line to your ``/etc/fstab`` file so that your system
can automatically mount debugfs upon booting::

debugfs /sys/kernel/debug debugfs defaults 0 0
Because DAMO is using the sysfs interface (refer to :doc:`usage` for the
detail) of DAMON, you should ensure :doc:`sysfs </filesystems/sysfs>` is
mounted.


Recording Data Access Patterns
Expand Down
5 changes: 5 additions & 0 deletions Documentation/admin-guide/mm/damon/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,11 @@ the files as above. Above is only for an example.
debugfs Interface
=================

.. note::

DAMON debugfs interface will be removed after next LTS kernel is released, so
users should move to the :ref:`sysfs interface <sysfs_interface>`.

DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
Expand Down
1 change: 1 addition & 0 deletions Documentation/admin-guide/mm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ the Linux memory management.
idle_page_tracking
ksm
memory-hotplug
multigen_lru
nommu-mmap
numa_memory_policy
numaperf
Expand Down
36 changes: 36 additions & 0 deletions Documentation/admin-guide/mm/ksm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,42 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
be increased accordingly.

Monitoring KSM profit
=====================

KSM can save memory by merging identical pages, but also can consume
additional memory, because it needs to generate a number of rmap_items to
save each scanned page's brief rmap information. Some of these pages may
be merged, but some may not be abled to be merged after being checked
several times, which are unprofitable memory consumed.

1) How to determine whether KSM save memory or consume memory in system-wide
range? Here is a simple approximate calculation for reference::

general_profit =~ pages_sharing * sizeof(page) - (all_rmap_items) *
sizeof(rmap_item);

where all_rmap_items can be easily obtained by summing ``pages_sharing``,
``pages_shared``, ``pages_unshared`` and ``pages_volatile``.

2) The KSM profit inner a single process can be similarly obtained by the
following approximate calculation::

process_profit =~ ksm_merging_pages * sizeof(page) -
ksm_rmap_items * sizeof(rmap_item).

where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``.

From the perspective of application, a high ratio of ``ksm_rmap_items`` to
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
administrators have to rethink how to change madvise policy. Giving an example
for reference, a page's size is usually 4K, and the rmap_item's size is
separately 32B on 32-bit CPU architecture and 64B on 64-bit CPU architecture.
so if the ``ksm_rmap_items/ksm_merging_pages`` ratio exceeds 64 on 64-bit CPU
or exceeds 128 on 32-bit CPU, then the app's madvise policy should be dropped,
because the ksm profit is approximately zero or negative.

Monitoring KSM events
=====================

Expand Down
Loading

0 comments on commit 6b2b0d8

Please sign in to comment.