Skip to content

Commit

Permalink
Merge branch 'mm-stable' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/akpm/mm
  • Loading branch information
Stephen Rothwell committed Sep 16, 2022
2 parents 7fcb7bf + 088b8aa commit 818ae0d
Show file tree
Hide file tree
Showing 147 changed files with 2,603 additions and 1,625 deletions.
8 changes: 8 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1469,6 +1469,14 @@
Permit 'security.evm' to be updated regardless of
current integrity status.

early_page_ext [KNL] Enforces page_ext initialization to earlier
stages so cover more early boot allocations.
Please note that as side effect some optimizations
might be disabled to achieve that (e.g. parallelized
memory initialization is disabled) so the boot process
might take longer, especially on systems with a lot of
memory. Available with CONFIG_PAGE_EXTENSION=y.

failslab=
fail_usercopy=
fail_page_alloc=
Expand Down
10 changes: 5 additions & 5 deletions Documentation/admin-guide/mm/cma_debugfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ CMA Debugfs Interface
The CMA debugfs interface is useful to retrieve basic information out of the
different CMA areas and to test allocation/release in each of the areas.

Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
kernel's CMA index. So the first CMA zone would be:
Each CMA area represents a directory under <debugfs>/cma/, represented by
its CMA name like below:

<debugfs>/cma/cma-0
<debugfs>/cma/<cma_name>

The structure of the files created under that directory is as follows:

Expand All @@ -18,8 +18,8 @@ The structure of the files created under that directory is as follows:
- [RO] bitmap: The bitmap of page states in the zone.
- [WO] alloc: Allocate N pages from that CMA area. For example::

echo 5 > <debugfs>/cma/cma-2/alloc
echo 5 > <debugfs>/cma/<cma_name>/alloc

would try to allocate 5 pages from the cma-2 area.
would try to allocate 5 pages from the 'cma_name' area.

- [WO] free: Free N pages from that CMA area, similar to the above.
41 changes: 38 additions & 3 deletions Documentation/admin-guide/mm/userfaultfd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick.
Design
======

Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
Userspace creates a new userfaultfd, initializes it, and registers one or more
regions of virtual memory with it. Then, any page faults which occur within the
region(s) result in a message being delivered to the userfaultfd, notifying
userspace of the fault.

The ``userfaultfd`` (aside from registering and unregistering virtual
memory ranges) provides two primary functionalities:
Expand All @@ -34,12 +37,11 @@ The real advantage of userfaults if compared to regular virtual memory
management of mremap/mprotect is that the userfaults in all their
operations never involve heavyweight structures like vmas (in fact the
``userfaultfd`` runtime load never takes the mmap_lock for writing).

Vmas are not suitable for page- (or hugepage) granular fault tracking
when dealing with virtual address spaces that could span
Terabytes. Too many vmas would be needed for that.

The ``userfaultfd`` once opened by invoking the syscall, can also be
The ``userfaultfd``, once created, can also be
passed using unix domain sockets to a manager process, so the same
manager process could handle the userfaults of a multitude of
different processes without them being aware about what is going on
Expand All @@ -50,6 +52,39 @@ is a corner case that would currently return ``-EBUSY``).
API
===

Creating a userfaultfd
----------------------

There are two ways to create a new userfaultfd, each of which provide ways to
restrict access to this functionality (since historically userfaultfds which
handle kernel page faults have been a useful tool for exploiting the kernel).

The first way, supported since userfaultfd was introduced, is the
userfaultfd(2) syscall. Access to this is controlled in several ways:

- Any user can always create a userfaultfd which traps userspace page faults
only. Such a userfaultfd can be created using the userfaultfd(2) syscall
with the flag UFFD_USER_MODE_ONLY.

- In order to also trap kernel page faults for the address space, either the
process needs the CAP_SYS_PTRACE capability, or the system must have
vm.unprivileged_userfaultfd set to 1. By default, vm.unprivileged_userfaultfd
is set to 0.

The second way, added to the kernel more recently, is by opening
/dev/userfaultfd and issuing a USERFAULTFD_IOC_NEW ioctl to it. This method
yields equivalent userfaultfds to the userfaultfd(2) syscall.

Unlike userfaultfd(2), access to /dev/userfaultfd is controlled via normal
filesystem permissions (user/group/mode), which gives fine grained access to
userfaultfd specifically, without also granting other unrelated privileges at
the same time (as e.g. granting CAP_SYS_PTRACE would do). Users who have access
to /dev/userfaultfd can always create userfaultfds that trap kernel page faults;
vm.unprivileged_userfaultfd is not considered.

Initializing a userfaultfd
--------------------------

When first opened the ``userfaultfd`` must be enabled invoking the
``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
a later API version) which will specify the ``read/POLLIN`` protocol
Expand Down
11 changes: 11 additions & 0 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,17 @@ different types of memory (represented as different NUMA nodes) to
place the hot pages in the fast memory. This is implemented based on
unmapping and page fault too.

numa_balancing_promote_rate_limit_MBps
======================================

Too high promotion/demotion throughput between different memory types
may hurt application latency. This can be used to rate limit the
promotion throughput. The per-node max promotion throughput in MB/s
will be limited to be no more than the set value.

A rule of thumb is to set this to less than 1/10 of the PMEM node
write bandwidth.

oops_all_cpu_backtrace
======================

Expand Down
3 changes: 3 additions & 0 deletions Documentation/admin-guide/sysctl/vm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,9 @@ calls without any restrictions.

The default value is 0.

Another way to control permissions for userfaultfd is to use
/dev/userfaultfd instead of userfaultfd(2). See
Documentation/admin-guide/mm/userfaultfd.rst.

user_reserve_kbytes
===================
Expand Down
5 changes: 5 additions & 0 deletions Documentation/mm/page_owner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,11 @@ Usage
Page allocated via order XXX, ...
PFN XXX ...
// Detailed stack
By default, it will do full pfn dump, to start with a given pfn,
page_owner supports fseek.

FILE *fp = fopen("/sys/kernel/debug/page_owner", "r");
fseek(fp, pfn_start, SEEK_SET);

The ``page_owner_sort`` tool ignores ``PFN`` rows, puts the remaining rows
in buf, uses regexp to extract the page order value, counts the times
Expand Down
2 changes: 2 additions & 0 deletions arch/alpha/include/uapi/asm/mman.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@

#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */

#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */

/* compatibility flags */
#define MAP_FILE 0

Expand Down
2 changes: 1 addition & 1 deletion arch/arc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -554,7 +554,7 @@ config ARC_BUILTIN_DTB_NAME

endmenu # "ARC Architecture Configuration"

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "12" if ARC_HUGEPAGE_16M
default "11"
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1362,7 +1362,7 @@ config ARM_MODULE_PLTS
Disabling this is usually safe for small single-platform
configurations. If unsure, say y.

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "12" if SOC_AM33XX
default "9" if SA1111
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/imx_v6_v7_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
CONFIG_SMP=y
CONFIG_ARM_PSCI=y
CONFIG_HIGHMEM=y
CONFIG_FORCE_MAX_ZONEORDER=14
CONFIG_ARCH_FORCE_MAX_ORDER=14
CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
CONFIG_KEXEC=y
CONFIG_CPU_FREQ=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/milbeaut_m10v_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
# CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
# CONFIG_ARM_PATCH_IDIV is not set
CONFIG_HIGHMEM=y
CONFIG_FORCE_MAX_ZONEORDER=12
CONFIG_ARCH_FORCE_MAX_ORDER=12
CONFIG_SECCOMP=y
CONFIG_KEXEC=y
CONFIG_EFI=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/oxnas_v6_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ CONFIG_ARCH_OXNAS=y
CONFIG_MACH_OX820=y
CONFIG_SMP=y
CONFIG_NR_CPUS=16
CONFIG_FORCE_MAX_ZONEORDER=12
CONFIG_ARCH_FORCE_MAX_ORDER=12
CONFIG_SECCOMP=y
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/pxa_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ CONFIG_MACH_AKITA=y
CONFIG_MACH_BORZOI=y
CONFIG_PXA_SYSTEMS_CPLDS=y
CONFIG_AEABI=y
CONFIG_FORCE_MAX_ZONEORDER=9
CONFIG_ARCH_FORCE_MAX_ORDER=9
CONFIG_CMDLINE="root=/dev/ram0 ro"
CONFIG_KEXEC=y
CONFIG_CPU_FREQ=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/sama7_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
# CONFIG_CACHE_L2X0 is not set
# CONFIG_ARM_PATCH_IDIV is not set
# CONFIG_CPU_SW_DOMAIN_PAN is not set
CONFIG_FORCE_MAX_ZONEORDER=15
CONFIG_ARCH_FORCE_MAX_ORDER=15
CONFIG_UACCESS_WITH_MEMCPY=y
# CONFIG_ATAGS is not set
CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/configs/sp7021_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ CONFIG_ARCH_SUNPLUS=y
# CONFIG_VDSO is not set
CONFIG_SMP=y
CONFIG_THUMB2_KERNEL=y
CONFIG_FORCE_MAX_ZONEORDER=12
CONFIG_ARCH_FORCE_MAX_ORDER=12
CONFIG_VFP=y
CONFIG_NEON=y
CONFIG_MODULES=y
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1418,7 +1418,7 @@ config XEN
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int
default "14" if ARM64_64K_PAGES
default "12" if ARM64_16K_PAGES
Expand Down
2 changes: 1 addition & 1 deletion arch/csky/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ config HIGHMEM
select KMAP_LOCAL
default y

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "11"

Expand Down
2 changes: 1 addition & 1 deletion arch/ia64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ config IA64_CYCLONE
Say Y here to enable support for IBM EXA Cyclone time source.
If you're unsure, answer N.

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "MAX_ORDER (11 - 17)" if !HUGETLB_PAGE
range 11 17 if !HUGETLB_PAGE
default "17" if HUGETLB_PAGE
Expand Down
6 changes: 3 additions & 3 deletions arch/ia64/include/asm/sparsemem.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@

#define SECTION_SIZE_BITS (30)
#define MAX_PHYSMEM_BITS (50)
#ifdef CONFIG_FORCE_MAX_ZONEORDER
#if ((CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
#ifdef CONFIG_ARCH_FORCE_MAX_ORDER
#if ((CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT) > SECTION_SIZE_BITS)
#undef SECTION_SIZE_BITS
#define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT)
#define SECTION_SIZE_BITS (CONFIG_ARCH_FORCE_MAX_ORDER - 1 + PAGE_SHIFT)
#endif
#endif

Expand Down
2 changes: 1 addition & 1 deletion arch/loongarch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -381,7 +381,7 @@ config NODES_SHIFT
default "6"
depends on NUMA

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 14 64 if PAGE_SIZE_64KB
default "14" if PAGE_SIZE_64KB
Expand Down
2 changes: 1 addition & 1 deletion arch/m68k/Kconfig.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ config SINGLE_MEMORY_CHUNK
order" to save memory that could be wasted for unused memory map.
Say N if not sure.

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order" if ADVANCED
depends on !SINGLE_MEMORY_CHUNK
default "11"
Expand Down
2 changes: 1 addition & 1 deletion arch/mips/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -2140,7 +2140,7 @@ config PAGE_SIZE_64KB

endchoice

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 14 64 if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
default "14" if MIPS_HUGE_TLB_SUPPORT && PAGE_SIZE_64KB
Expand Down
2 changes: 2 additions & 0 deletions arch/mips/include/uapi/asm/mman.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@

#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */

#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */

/* compatibility flags */
#define MAP_FILE 0

Expand Down
2 changes: 1 addition & 1 deletion arch/nios2/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ menu "Kernel features"

source "kernel/Kconfig.hz"

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 9 20
default "11"
Expand Down
2 changes: 2 additions & 0 deletions arch/parisc/include/uapi/asm/mman.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@
#define MADV_WIPEONFORK 71 /* Zero memory on fork, child only */
#define MADV_KEEPONFORK 72 /* Undo MADV_WIPEONFORK */

#define MADV_COLLAPSE 73 /* Synchronous hugepage collapse */

#define MADV_HWPOISON 100 /* poison a page for testing */
#define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */

Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -845,7 +845,7 @@ config DATA_SHIFT
in that case. If PIN_TLB is selected, it must be aligned to 8M as
8M pages will be pinned.

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 8 9 if PPC64 && PPC_64K_PAGES
default "9" if PPC64 && PPC_64K_PAGES
Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/configs/85xx/ge_imp3a_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ CONFIG_PREEMPT=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_BINFMT_MISC=m
CONFIG_MATH_EMULATION=y
CONFIG_FORCE_MAX_ZONEORDER=17
CONFIG_ARCH_FORCE_MAX_ORDER=17
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCI_MSI=y
Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/configs/fsl-emb-nonhw.config
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ CONFIG_FIXED_PHY=y
CONFIG_FONT_8x16=y
CONFIG_FONT_8x8=y
CONFIG_FONTS=y
CONFIG_FORCE_MAX_ZONEORDER=13
CONFIG_ARCH_FORCE_MAX_ORDER=13
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAME_WARN=1024
CONFIG_FTL=y
Expand Down
10 changes: 0 additions & 10 deletions arch/s390/mm/hugetlbpage.c
Original file line number Diff line number Diff line change
Expand Up @@ -237,16 +237,6 @@ int pud_huge(pud_t pud)
return pud_large(pud);
}

struct page *
follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags)
{
if (flags & FOLL_GET)
return NULL;

return pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
}

bool __init arch_hugetlb_valid_size(unsigned long size)
{
if (MACHINE_HAS_EDAT1 && size == PMD_SIZE)
Expand Down
2 changes: 1 addition & 1 deletion arch/sh/configs/ecovec24_defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
CONFIG_CPU_SUBTYPE_SH7724=y
CONFIG_FORCE_MAX_ZONEORDER=12
CONFIG_ARCH_FORCE_MAX_ORDER=12
CONFIG_MEMORY_SIZE=0x10000000
CONFIG_FLATMEM_MANUAL=y
CONFIG_SH_ECOVEC=y
Expand Down
2 changes: 1 addition & 1 deletion arch/sh/mm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ config PAGE_OFFSET
default "0x80000000" if MMU
default "0x00000000"

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
range 9 64 if PAGE_SIZE_16KB
default "9" if PAGE_SIZE_16KB
Expand Down
2 changes: 1 addition & 1 deletion arch/sparc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ config ARCH_SPARSEMEM_ENABLE
config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64

config FORCE_MAX_ZONEORDER
config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "13"
help
Expand Down
Loading

0 comments on commit 818ae0d

Please sign in to comment.