Skip to content

Commit

Permalink
Merge branch 'akpm' (patches from Andrew)
Browse files Browse the repository at this point in the history
Merge still more updates from Andrew Morton:
 "Various trees. Mainly those parts of MM whose linux-next dependents
  are now merged. I'm still sitting on ~160 patches which await merges
  from -next.

  Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug,
  panic, lib, sysctl, mm/gup, mm/pagemap"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (52 commits)
  doc: cgroup: update note about conditions when oom killer is invoked
  module: move the set_fs hack for flush_icache_range to m68k
  nommu: use flush_icache_user_range in brk and mmap
  binfmt_flat: use flush_icache_user_range
  exec: use flush_icache_user_range in read_code
  exec: only build read_code when needed
  m68k: implement flush_icache_user_range
  arm: rename flush_cache_user_range to flush_icache_user_range
  xtensa: implement flush_icache_user_range
  sh: implement flush_icache_user_range
  asm-generic: add a flush_icache_user_range stub
  mm: rename flush_icache_user_range to flush_icache_user_page
  arm,sparc,unicore32: remove flush_icache_user_range
  riscv: use asm-generic/cacheflush.h
  powerpc: use asm-generic/cacheflush.h
  openrisc: use asm-generic/cacheflush.h
  m68knommu: use asm-generic/cacheflush.h
  microblaze: use asm-generic/cacheflush.h
  ia64: use asm-generic/cacheflush.h
  hexagon: use asm-generic/cacheflush.h
  ...
  • Loading branch information
Linus Torvalds committed Jun 8, 2020
2 parents 63d72b9 + db33ec3 commit 20b0d06
Show file tree
Hide file tree
Showing 75 changed files with 701 additions and 493 deletions.
17 changes: 8 additions & 9 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1170,6 +1170,13 @@ PAGE_SIZE multiple when read back.
Under certain circumstances, the usage may go over the limit
temporarily.

In default configuration regular 0-order allocations always
succeed unless OOM killer chooses current task as a victim.

Some kinds of allocations don't invoke the OOM killer.
Caller could retry them differently, return into userspace
as -ENOMEM or silently ignore in cases like disk readahead.

This is the ultimate protection mechanism. As long as the
high limit is used and monitored properly, this limit's
utility is limited to providing the final safety net.
Expand Down Expand Up @@ -1226,17 +1233,9 @@ PAGE_SIZE multiple when read back.
The number of time the cgroup's memory usage was
reached the limit and allocation was about to fail.

Depending on context result could be invocation of OOM
killer and retrying allocation or failing allocation.

Failed allocation in its turn could be returned into
userspace as -ENOMEM or silently ignored in cases like
disk readahead. For now OOM in memory cgroup kills
tasks iff shortage has happened inside page fault.

This event is not raised if the OOM killer is not
considered as an option, e.g. for failed high-order
allocations.
allocations or if caller asked to not retry attempts.

oom_kill
The number of processes belonging to this cgroup
Expand Down
5 changes: 5 additions & 0 deletions Documentation/admin-guide/dynamic-debug-howto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ kernel code to obtain additional kernel information. Currently, if
``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
enabled per-callsite.

If you do not want to enable dynamic debug globally (i.e. in some embedded
system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic
debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any
modules which you'd like to dynamically debug later.

If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
shortcut for ``print_hex_dump(KERN_DEBUG)``.

Expand Down
8 changes: 8 additions & 0 deletions Documentation/admin-guide/kdump/kdump.rst
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,14 @@ will cause a kdump to occur at the panic() call. In cases where a user wants
to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
to achieve the same behaviour.

Trigger Kdump on add_taint()
============================

The kernel parameter panic_on_taint facilitates a conditional call to panic()
from within add_taint() whenever the value set in this bitmask matches with the
bit flag being set by add_taint().
This will cause a kdump to occur at the add_taint()->panic() call.

Contact
=======

Expand Down
34 changes: 28 additions & 6 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1445,7 +1445,7 @@
hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
Format: <integer>
Format: 0 | 1

hashdist= [KNL,NUMA] Large hashes allocated during boot
are distributed across NUMA nodes. Defaults on
Expand Down Expand Up @@ -1513,9 +1513,9 @@

hung_task_panic=
[KNL] Should the hung task detector generate panics.
Format: <integer>
Format: 0 | 1

A nonzero value instructs the kernel to panic when a
A value of 1 instructs the kernel to panic when a
hung task is detected. The default value is controlled
by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
option. The value selected by this boot parameter can
Expand Down Expand Up @@ -3447,6 +3447,19 @@
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer

panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: <hex>[,nousertaint]
Hexadecimal bitmask representing the set of TAINT flags
that will cause the kernel to panic when add_taint() is
called with any of the flags in this set.
The optional switch "nousertaint" can be utilized to
prevent userspace forced crashes by writing to sysctl
/proc/sys/kernel/tainted any flagset matching with the
bitmask set on panic_on_taint.
See Documentation/admin-guide/tainted-kernels.rst for
extra details on the taint flags that users can pick
to compose the bitmask to assign to panic_on_taint.

panic_on_warn panic() instead of WARN(). Useful to cause kdump
on a WARN().

Expand Down Expand Up @@ -4652,9 +4665,9 @@

softlockup_panic=
[KNL] Should the soft-lockup detector generate panics.
Format: <integer>
Format: 0 | 1

A nonzero value instructs the soft-lockup detector
A value of 1 instructs the soft-lockup detector
to panic the machine when a soft-lockup occurs. It is
also controlled by the kernel.softlockup_panic sysctl
and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
Expand All @@ -4663,7 +4676,7 @@
softlockup_all_cpu_backtrace=
[KNL] Should the soft-lockup detector generate
backtraces on all cpus.
Format: <integer>
Format: 0 | 1

sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/admin-guide/laptops/sonypi.rst
Expand Down Expand Up @@ -4956,6 +4969,15 @@

switches= [HW,M68k]

sysctl.*= [KNL]
Set a sysctl parameter, right before loading the init
process, as if the value was written to the respective
/proc/sys/... file. Both '.' and '/' are recognized as
separators. Unrecognized parameters and invalid values
are reported in the kernel log. Sysctls registered
later by a loaded module cannot be set this way.
Example: sysctl.vm.swappiness=40

sysfs.deprecated=0|1 [KNL]
Enable/disable old style sysfs layout for old udev
on older distributions. When this option is enabled
Expand Down
37 changes: 37 additions & 0 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,20 @@ Path for the hotplug policy agent.
Default value is "``/sbin/hotplug``".


hung_task_all_cpu_backtrace:
================

If this option is set, the kernel will send an NMI to all CPUs to dump
their backtraces when a hung task is detected. This file shows up if
CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.

0: Won't show all CPUs backtraces when a hung task is detected.
This is the default behavior.

1: Will non-maskably interrupt all CPUs and dump their backtraces when
a hung task is detected.


hung_task_panic
===============

Expand Down Expand Up @@ -632,6 +646,22 @@ rate for each task.
scanned for a given scan.


oops_all_cpu_backtrace:
================

If this option is set, the kernel will send an NMI to all CPUs to dump
their backtraces when an oops event occurs. It should be used as a last
resort in case a panic cannot be triggered (to protect VMs running, for
example) or kdump can't be collected. This file shows up if CONFIG_SMP
is enabled.

0: Won't show all CPUs backtraces when an oops is detected.
This is the default behavior.

1: Will non-maskably interrupt all CPUs and dump their backtraces when
an oops event is detected.


osrelease, ostype & version
===========================

Expand Down Expand Up @@ -1239,6 +1269,13 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.

See :doc:`/admin-guide/tainted-kernels` for more information.

Note:
writes to this sysctl interface will fail with ``EINVAL`` if the kernel is
booted with the command line option ``panic_on_taint=<bitmask>,nousertaint``
and any of the ORed together values being written to ``tainted`` match with
the bitmask declared on panic_on_taint.
See :doc:`/admin-guide/kernel-parameters` for more details on that particular
kernel command line option and its optional ``nousertaint`` switch.

threads-max
===========
Expand Down
51 changes: 37 additions & 14 deletions Documentation/core-api/pin_user_pages.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,23 +148,46 @@ NOTE: Some pages, such as DAX pages, cannot be pinned with longterm pins. That's
because DAX pages do not have a separate page cache, and so "pinning" implies
locking down file system blocks, which is not (yet) supported in that way.

CASE 3: Hardware with page faulting support
-------------------------------------------
Here, a well-written driver doesn't normally need to pin pages at all. However,
if the driver does choose to do so, it can register MMU notifiers for the range,
and will be called back upon invalidation. Either way (avoiding page pinning, or
using MMU notifiers to unpin upon request), there is proper synchronization with
both filesystem and mm (page_mkclean(), munmap(), etc).

Therefore, neither flag needs to be set.

In this case, ideally, neither get_user_pages() nor pin_user_pages() should be
called. Instead, the software should be written so that it does not pin pages.
This allows mm and filesystems to operate more efficiently and reliably.
CASE 3: MMU notifier registration, with or without page faulting hardware
-------------------------------------------------------------------------
Device drivers can pin pages via get_user_pages*(), and register for mmu
notifier callbacks for the memory range. Then, upon receiving a notifier
"invalidate range" callback , stop the device from using the range, and unpin
the pages. There may be other possible schemes, such as for example explicitly
synchronizing against pending IO, that accomplish approximately the same thing.

Or, if the hardware supports replayable page faults, then the device driver can
avoid pinning entirely (this is ideal), as follows: register for mmu notifier
callbacks as above, but instead of stopping the device and unpinning in the
callback, simply remove the range from the device's page tables.

Either way, as long as the driver unpins the pages upon mmu notifier callback,
then there is proper synchronization with both filesystem and mm
(page_mkclean(), munmap(), etc). Therefore, neither flag needs to be set.

CASE 4: Pinning for struct page manipulation only
-------------------------------------------------
Here, normal GUP calls are sufficient, so neither flag needs to be set.
If only struct page data (as opposed to the actual memory contents that a page
is tracking) is affected, then normal GUP calls are sufficient, and neither flag
needs to be set.

CASE 5: Pinning in order to write to the data within the page
-------------------------------------------------------------
Even though neither DMA nor Direct IO is involved, just a simple case of "pin,
write to a page's data, unpin" can cause a problem. Case 5 may be considered a
superset of Case 1, plus Case 2, plus anything that invokes that pattern. In
other words, if the code is neither Case 1 nor Case 2, it may still require
FOLL_PIN, for patterns like this:

Correct (uses FOLL_PIN calls):
pin_user_pages()
write to the data within the pages
unpin_user_pages()

INCORRECT (uses FOLL_GET calls):
get_user_pages()
write to the data within the pages
put_page()

page_maybe_dma_pinned(): the whole point of pinning
===================================================
Expand Down
32 changes: 8 additions & 24 deletions arch/alpha/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,6 @@

#include <linux/mm.h>

/* Caches aren't brain-dead on the Alpha. */
#define flush_cache_all() do { } while (0)
#define flush_cache_mm(mm) do { } while (0)
#define flush_cache_dup_mm(mm) do { } while (0)
#define flush_cache_range(vma, start, end) do { } while (0)
#define flush_cache_page(vma, vmaddr, pfn) do { } while (0)
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
#define flush_dcache_page(page) do { } while (0)
#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)
#define flush_cache_vmap(start, end) do { } while (0)
#define flush_cache_vunmap(start, end) do { } while (0)

/* Note that the following two definitions are _highly_ dependent
on the contexts in which they are used in the kernel. I personally
think it is criminal how loosely defined these macros are. */
Expand Down Expand Up @@ -48,7 +35,7 @@ extern void smp_imb(void);

extern void __load_new_mm_context(struct mm_struct *);
static inline void
flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long addr, int len)
{
if (vma->vm_flags & VM_EXEC) {
Expand All @@ -59,20 +46,17 @@ flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
mm->context[smp_processor_id()] = 0;
}
}
#else
extern void flush_icache_user_range(struct vm_area_struct *vma,
#define flush_icache_user_page flush_icache_user_page
#else /* CONFIG_SMP */
extern void flush_icache_user_page(struct vm_area_struct *vma,
struct page *page, unsigned long addr, int len);
#endif
#define flush_icache_user_page flush_icache_user_page
#endif /* CONFIG_SMP */

/* This is used only in __do_fault and do_swap_page. */
#define flush_icache_page(vma, page) \
flush_icache_user_range((vma), (page), 0, 0)
flush_icache_user_page((vma), (page), 0, 0)

#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { memcpy(dst, src, len); \
flush_icache_user_range(vma, page, vaddr, len); \
} while (0)
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
memcpy(dst, src, len)
#include <asm-generic/cacheflush.h>

#endif /* _ALPHA_CACHEFLUSH_H */
2 changes: 1 addition & 1 deletion arch/alpha/kernel/smp.c
Original file line number Diff line number Diff line change
Expand Up @@ -740,7 +740,7 @@ ipi_flush_icache_page(void *x)
}

void
flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long addr, int len)
{
struct mm_struct *mm = vma->vm_mm;
Expand Down
7 changes: 2 additions & 5 deletions arch/arm/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -258,11 +258,11 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
#define flush_cache_dup_mm(mm) flush_cache_mm(mm)

/*
* flush_cache_user_range is used when we want to ensure that the
* flush_icache_user_range is used when we want to ensure that the
* Harvard caches are synchronised for the user space address range.
* This is used for the ARM private sys_cacheflush system call.
*/
#define flush_cache_user_range(s,e) __cpuc_coherent_user_range(s,e)
#define flush_icache_user_range(s,e) __cpuc_coherent_user_range(s,e)

/*
* Perform necessary cache operations to ensure that data previously
Expand Down Expand Up @@ -318,9 +318,6 @@ extern void flush_kernel_dcache_page(struct page *);
#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)

#define flush_icache_user_range(vma,page,addr,len) \
flush_dcache_page(page)

/*
* We don't appear to need to do anything here. In fact, if we did, we'd
* duplicate cache flushing elsewhere performed by flush_dcache_page().
Expand Down
4 changes: 2 additions & 2 deletions arch/arm/kernel/fiq.c
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,8 @@ void set_fiq_handler(void *start, unsigned int length)

memcpy(base + offset, start, length);
if (!cache_is_vipt_nonaliasing())
flush_icache_range((unsigned long)base + offset, offset +
length);
flush_icache_range((unsigned long)base + offset,
(unsigned long)base + offset + length);
flush_icache_range(0xffff0000 + offset, 0xffff0000 + offset + length);
}

Expand Down
2 changes: 1 addition & 1 deletion arch/arm/kernel/traps.c
Original file line number Diff line number Diff line change
Expand Up @@ -566,7 +566,7 @@ __do_cache_op(unsigned long start, unsigned long end)
if (fatal_signal_pending(current))
return 0;

ret = flush_cache_user_range(start, start + chunk);
ret = flush_icache_user_range(start, start + chunk);
if (ret)
return ret;

Expand Down
Loading

0 comments on commit 20b0d06

Please sign in to comment.