Skip to content

Commit

Permalink
Merge branch 'mm-everything' of git://git.kernel.org/pub/scm/linux/ke…
Browse files Browse the repository at this point in the history
…rnel/git/akpm/mm

# Conflicts:
#	mm/damon/sysfs.c
#	mm/gup.c
#	mm/huge_memory.c
  • Loading branch information
Stephen Rothwell committed Nov 29, 2022
2 parents 5bfd8a5 + a722cd8 commit f8903bd
Showing 226 changed files with 9,919 additions and 5,300 deletions.
1 change: 1 addition & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -136,6 +136,7 @@ ForEachMacros:
- 'data__for_each_file'
- 'data__for_each_file_new'
- 'data__for_each_file_start'
- 'dax_for_each_folio'
- 'device_for_each_child_node'
- 'displayid_iter_for_each'
- 'dma_fence_array_for_each'
1 change: 1 addition & 0 deletions .mailmap
Original file line number Diff line number Diff line change
@@ -228,6 +228,7 @@ Juha Yrjola <at solidboot.com>
Juha Yrjola <juha.yrjola@nokia.com>
Juha Yrjola <juha.yrjola@solidboot.com>
Julien Thierry <julien.thierry.kdev@gmail.com> <julien.thierry@arm.com>
Iskren Chernev <me@iskren.info> <iskren.chernev@gmail.com>
Kalle Valo <kvalo@kernel.org> <kvalo@codeaurora.org>
Kalyan Thota <quic_kalyant@quicinc.com> <kalyan_t@codeaurora.org>
Kay Sievers <kay.sievers@vrfy.org>
14 changes: 14 additions & 0 deletions Documentation/ABI/testing/sysfs-block-zram
Original file line number Diff line number Diff line change
@@ -137,3 +137,17 @@ Description:
The writeback_limit file is read-write and specifies the maximum
amount of writeback ZRAM can do. The limit could be changed
in run time.

What: /sys/block/zram<id>/recomp_algorithm
Date: November 2022
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description:
The recomp_algorithm file is read-write and allows to set
or show secondary compression algorithms.

What: /sys/block/zram<id>/recompress
Date: November 2022
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description:
The recompress file is write-only and triggers re-compression
with secondary compression algorithms.
68 changes: 68 additions & 0 deletions Documentation/ABI/testing/sysfs-class-bdi
Original file line number Diff line number Diff line change
@@ -44,6 +44,21 @@ Description:

(read-write)

What: /sys/class/bdi/<bdi>/min_ratio_fine
Date: November 2022
Contact: Stefan Roesch <shr@devkernel.io>
Description:
Under normal circumstances each device is given a part of the
total write-back cache that relates to its current average
writeout speed in relation to the other devices.

The 'min_ratio_fine' parameter allows assigning a minimum reserve
of the write-back cache to a particular device. The value is
expressed as part of 1 million. For example, this is useful for
providing a minimum QoS.

(read-write)

What: /sys/class/bdi/<bdi>/max_ratio
Date: January 2008
Contact: Peter Zijlstra <a.p.zijlstra@chello.nl>
@@ -55,6 +70,59 @@ Description:
mount that is prone to get stuck, or a FUSE mount which cannot
be trusted to play fair.

(read-write)

What: /sys/class/bdi/<bdi>/max_ratio_fine
Date: November 2022
Contact: Stefan Roesch <shr@devkernel.io>
Description:
Allows limiting a particular device to use not more than the
given value of the write-back cache. The value is given as part
of 1 million. This is useful in situations where we want to avoid
one device taking all or most of the write-back cache. For example
in case of an NFS mount that is prone to get stuck, or a FUSE mount
which cannot be trusted to play fair.

(read-write)

What: /sys/class/bdi/<bdi>/min_bytes
Date: October 2022
Contact: Stefan Roesch <shr@devkernel.io>
Description:
Under normal circumstances each device is given a part of the
total write-back cache that relates to its current average
writeout speed in relation to the other devices.

The 'min_bytes' parameter allows assigning a minimum
percentage of the write-back cache to a particular device
expressed in bytes.
For example, this is useful for providing a minimum QoS.

(read-write)

What: /sys/class/bdi/<bdi>/max_bytes
Date: October 2022
Contact: Stefan Roesch <shr@devkernel.io>
Description:
Allows limiting a particular device to use not more than the
given 'max_bytes' of the write-back cache. This is useful in
situations where we want to avoid one device taking all or
most of the write-back cache. For example in case of an NFS
mount that is prone to get stuck, a FUSE mount which cannot be
trusted to play fair, or a nbd device.

(read-write)

What: /sys/class/bdi/<bdi>/strict_limit
Date: October 2022
Contact: Stefan Roesch <shr@devkernel.io>
Description:
Forces per-BDI checks for the share of given device in the write-back
cache even before the global background dirty limit is reached. This
is useful in situations where the global limit is much higher than
affordable for given relatively slow (or untrusted) device. Turning
strictlimit on has no visible effect if max_ratio is equal to 100%.

(read-write)
What: /sys/class/bdi/<bdi>/stable_pages_required
Date: January 2008
32 changes: 32 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-mm-damon
Original file line number Diff line number Diff line change
@@ -27,6 +27,10 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or
makes the kdamond reads the user inputs in the sysfs files
except 'state' again. Writing 'update_schemes_stats' to the
file updates contents of schemes stats files of the kdamond.
Writing 'update_schemes_tried_regions' to the file updates
contents of 'tried_regions' directory of every scheme directory
of this kdamond. Writing 'clear_schemes_tried_regions' to the
file removes contents of the 'tried_regions' directory.

What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid
Date: Mar 2022
@@ -283,3 +287,31 @@ Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>
Description: Reading this file returns the number of the exceed events of
the scheme's quotas.

What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/start
Date: Oct 2022
Contact: SeongJae Park <sj@kernel.org>
Description: Reading this file returns the start address of a memory region
that corresponding DAMON-based Operation Scheme's action has
tried to be applied.

What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/end
Date: Oct 2022
Contact: SeongJae Park <sj@kernel.org>
Description: Reading this file returns the end address of a memory region
that corresponding DAMON-based Operation Scheme's action has
tried to be applied.

What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/nr_accesses
Date: Oct 2022
Contact: SeongJae Park <sj@kernel.org>
Description: Reading this file returns the 'nr_accesses' of a memory region
that corresponding DAMON-based Operation Scheme's action has
tried to be applied.

What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/age
Date: Oct 2022
Contact: SeongJae Park <sj@kernel.org>
Description: Reading this file returns the 'age' of a memory region that
corresponding DAMON-based Operation Scheme's action has tried
to be applied.
100 changes: 96 additions & 4 deletions Documentation/admin-guide/blockdev/zram.rst
Original file line number Diff line number Diff line change
@@ -348,8 +348,13 @@ this can be accomplished with::

echo huge_idle > /sys/block/zramX/writeback

If a user chooses to writeback only incompressible pages (pages that none of
algorithms can compress) this can be accomplished with::

echo incompressible > /sys/block/zramX/writeback

If an admin wants to write a specific page in zram device to the backing device,
they could write a page index into the interface.
they could write a page index into the interface::

echo "page_index=1251" > /sys/block/zramX/writeback

@@ -401,6 +406,87 @@ budget in next setting is user's job.
If admin wants to measure writeback count in a certain period, they could
know it via /sys/block/zram0/bd_stat's 3rd column.

recompression
-------------

With CONFIG_ZRAM_MULTI_COMP, zram can recompress pages using alternative
(secondary) compression algorithms. The basic idea is that alternative
compression algorithm can provide better compression ratio at a price of
(potentially) slower compression/decompression speeds. Alternative compression
algorithm can, for example, be more successful compressing huge pages (those
that default algorithm failed to compress). Another application is idle pages
recompression - pages that are cold and sit in the memory can be recompressed
using more effective algorithm and, hence, reduce zsmalloc memory usage.

With CONFIG_ZRAM_MULTI_COMP, zram supports up to 4 compression algorithms:
one primary and up to 3 secondary ones. Primary zram compressor is explained
in "3) Select compression algorithm", secondary algorithms are configured
using recomp_algorithm device attribute.

Example:::

#show supported recompression algorithms
cat /sys/block/zramX/recomp_algorithm
#1: lzo lzo-rle lz4 lz4hc [zstd]
#2: lzo lzo-rle lz4 [lz4hc] zstd

Alternative compression algorithms are sorted by priority. In the example
above, zstd is used as the first alternative algorithm, which has priority
of 1, while lz4hc is configured as a compression algorithm with priority 2.
Alternative compression algorithm's priority is provided during algorithms
configuration:::

#select zstd recompression algorithm, priority 1
echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm

#select deflate recompression algorithm, priority 2
echo "algo=deflate priority=2" > /sys/block/zramX/recomp_algorithm

Another device attribute that CONFIG_ZRAM_MULTI_COMP enables is recompress,
which controls recompression.

Examples:::

#IDLE pages recompression is activated by `idle` mode
echo "type=idle" > /sys/block/zramX/recompress

#HUGE pages recompression is activated by `huge` mode
echo "type=huge" > /sys/block/zram0/recompress

#HUGE_IDLE pages recompression is activated by `huge_idle` mode
echo "type=huge_idle" > /sys/block/zramX/recompress

The number of idle pages can be significant, so user-space can pass a size
threshold (in bytes) to the recompress knob: zram will recompress only pages
of equal or greater size:::

#recompress all pages larger than 3000 bytes
echo "threshold=3000" > /sys/block/zramX/recompress

#recompress idle pages larger than 2000 bytes
echo "type=idle threshold=2000" > /sys/block/zramX/recompress

Recompression of idle pages requires memory tracking.

During re-compression for every page, that matches re-compression criteria,
ZRAM iterates the list of registered alternative compression algorithms in
order of their priorities. ZRAM stops either when re-compression was
successful (re-compressed object is smaller in size than the original one)
and matches re-compression criteria (e.g. size threshold) or when there are
no secondary algorithms left to try. If none of the secondary algorithms can
successfully re-compressed the page such a page is marked as incompressible,
so ZRAM will not attempt to re-compress it in the future.

This re-compression behaviour, when it iterates through the list of
registered compression algorithms, increases our chances of finding the
algorithm that successfully compresses a particular page. Sometimes, however,
it is convenient (and sometimes even necessary) to limit recompression to
only one particular algorithm so that it will not try any other algorithms.
This can be achieved by providing a algo=NAME parameter:::

#use zstd algorithm only (if registered)
echo "type=huge algo=zstd" > /sys/block/zramX/recompress

memory tracking
===============

@@ -411,9 +497,11 @@ pages of the process with*pagemap.
If you enable the feature, you could see block state via
/sys/kernel/debug/zram/zram0/block_state". The output is as follows::

300 75.033841 .wh.
301 63.806904 s...
302 63.806919 ..hi
300 75.033841 .wh...
301 63.806904 s.....
302 63.806919 ..hi..
303 62.801919 ....r.
304 146.781902 ..hi.n

First column
zram's block index.
@@ -430,6 +518,10 @@ Third column
huge page
i:
idle page
r:
recompressed page (secondary compression algorithm)
n:
none (including secondary) of algorithms could compress it

First line of above example says 300th block is accessed at 75.033841sec
and the block's state is huge so it is written back to the backing
3 changes: 2 additions & 1 deletion Documentation/admin-guide/cgroup-v1/memory.rst
Original file line number Diff line number Diff line change
@@ -543,7 +543,8 @@ inactive_anon # of bytes of anonymous and swap cache memory on inactive
LRU list.
active_anon # of bytes of anonymous and swap cache memory on active
LRU list.
inactive_file # of bytes of file-backed memory on inactive LRU list.
inactive_file # of bytes of file-backed memory and MADV_FREE anonymous memory(
LazyFree pages) on inactive LRU list.
active_file # of bytes of file-backed memory on active LRU list.
unevictable # of bytes of memory that cannot be reclaimed (mlocked etc).
=============== ===============================================================
6 changes: 6 additions & 0 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
@@ -1488,12 +1488,18 @@ PAGE_SIZE multiple when read back.
pgscan_direct (npn)
Amount of scanned pages directly (in an inactive LRU list)

pgscan_khugepaged (npn)
Amount of scanned pages by khugepaged (in an inactive LRU list)

pgsteal_kswapd (npn)
Amount of reclaimed pages by kswapd

pgsteal_direct (npn)
Amount of reclaimed pages directly

pgsteal_khugepaged (npn)
Amount of reclaimed pages by khugepaged

pgfault (npn)
Total number of page faults incurred

Loading

0 comments on commit f8903bd

Please sign in to comment.