Skip to content

Commit

Permalink
Merge tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux…
Browse files Browse the repository at this point in the history
…/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The RCU updates for this cycle were:

   - RCU-tasks update, including addition of RCU Tasks Trace for BPF use
     and TASKS_RUDE_RCU

   - kfree_rcu() updates.

   - Remove scheduler locking restriction

   - RCU CPU stall warning updates.

   - Torture-test updates.

   - Miscellaneous fixes and other updates"

* tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
  rcu: Allow for smp_call_function() running callbacks from idle
  rcu: Provide rcu_irq_exit_check_preempt()
  rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
  rcu: Provide __rcu_is_watching()
  rcu: Provide rcu_irq_exit_preempt()
  rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  rcu/tree: Mark the idle relevant functions noinstr
  x86: Replace ist_enter() with nmi_enter()
  x86/mce: Send #MC singal from task work
  x86/entry: Get rid of ist_begin/end_non_atomic()
  sched,rcu,tracing: Avoid tracing before in_nmi() is correct
  sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
  lockdep: Always inline lockdep_{off,on}()
  hardirq/nmi: Allow nested nmi_enter()
  arm64: Prepare arch_nmi_enter() for recursion
  printk: Disallow instrumenting print_nmi_enter()
  printk: Prepare for nested printk_nmi_enter()
  rcutorture: Convert ULONG_CMP_LT() to time_before()
  torture: Add a --kasan argument
  torture: Save a few lines by using config_override_param initially
  ...
  • Loading branch information
Linus Torvalds committed Jun 1, 2020
2 parents 0bd957e + cb3cb67 commit 2227e5b
Show file tree
Hide file tree
Showing 62 changed files with 2,537 additions and 948 deletions.
61 changes: 16 additions & 45 deletions Documentation/RCU/Design/Requirements/Requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier.
Scheduler and RCU
~~~~~~~~~~~~~~~~~

RCU depends on the scheduler, and the scheduler uses RCU to protect some
of its data structures. The preemptible-RCU ``rcu_read_unlock()``
implementation must therefore be written carefully to avoid deadlocks
involving the scheduler's runqueue and priority-inheritance locks. In
particular, ``rcu_read_unlock()`` must tolerate an interrupt where the
interrupt handler invokes both ``rcu_read_lock()`` and
``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()``
to use negative nesting levels to avoid destructive recursion via
interrupt handler's use of RCU.

This scheduler-RCU requirement came as a `complete
surprise <https://lwn.net/Articles/453002/>`__.

As noted above, RCU makes use of kthreads, and it is necessary to avoid
excessive CPU-time accumulation by these kthreads. This requirement was
no surprise, but RCU's violation of it when running context-switch-heavy
workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a
surprise
RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
accumulation by these kthreads. This requirement was no surprise, but
RCU's violation of it when running context-switch-heavy workloads when
built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise
[PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__.
RCU has made good progress towards meeting this requirement, even for
context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is
room for further improvement.

It is forbidden to hold any of scheduler's runqueue or
priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless
interrupts have been disabled across the entire RCU read-side critical
section, that is, up to and including the matching ``rcu_read_lock()``.
Violating this restriction can result in deadlocks involving these
scheduler spinlocks. There was hope that this restriction might be
lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started
deferring the reporting of the resulting RCU-preempt quiescent state
until the end of the corresponding interrupts-disabled region.
Unfortunately, timely reporting of the corresponding quiescent state to
expedited grace periods requires a call to ``raise_softirq()``, which
can acquire these scheduler spinlocks. In addition, real-time systems
using RCU priority boosting need this restriction to remain in effect
because deferred quiescent-state reporting would also defer deboosting,
which in turn would degrade real-time latencies.

In theory, if a given RCU read-side critical section could be guaranteed
to be less than one second in duration, holding a scheduler spinlock
across that critical section's ``rcu_read_unlock()`` would require only
that preemption be disabled across the entire RCU read-side critical
section, not interrupts. Unfortunately, given the possibility of vCPU
preemption, long-running interrupts, and so on, it is not possible in
practice to guarantee that a given RCU read-side critical section will
complete in less than one second. Therefore, as noted above, if
scheduler spinlocks are held across a given call to
``rcu_read_unlock()``, interrupts must be disabled across the entire RCU
read-side critical section.
There is no longer any prohibition against holding any of
scheduler's runqueue or priority-inheritance spinlocks across an
``rcu_read_unlock()``, even if interrupts and preemption were enabled
somewhere within the corresponding RCU read-side critical section.
Therefore, it is now perfectly legal to execute ``rcu_read_lock()``
with preemption enabled, acquire one of the scheduler locks, and hold
that lock across the matching ``rcu_read_unlock()``.

Similarly, the RCU flavor consolidation has removed the need for negative
nesting. The fact that interrupt-disabled regions of code act as RCU
read-side critical sections implicitly avoids earlier issues that used
to result in destructive recursion via interrupt handler's use of RCU.

Tracing and RCU
~~~~~~~~~~~~~~~
Expand Down
19 changes: 19 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4210,12 +4210,24 @@
Duration of CPU stall (s) to test RCU CPU stall
warnings, zero to disable.

rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result
in warnings from preemptible RCU in addition
to any other stall-related activity.

rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall.

rcutorture.stall_cpu_irqsoff= [KNL]
Disable interrupts while stalling if set.

rcutorture.stall_gp_kthread= [KNL]
Duration (s) of forced sleep within RCU
grace-period kthread to test RCU CPU stall
warnings, zero to disable. If both stall_cpu
and stall_gp_kthread are specified, the
kthread is starved first, then the CPU.

rcutorture.stat_interval= [KNL]
Time (s) between statistics printk()s.

Expand Down Expand Up @@ -4286,6 +4298,13 @@
only normal grace-period primitives. No effect
on CONFIG_TINY_RCU kernels.

rcupdate.rcu_task_ipi_delay= [KNL]
Set time in jiffies during which RCU tasks will
avoid sending IPIs, starting with the beginning
of a given grace period. Setting a large
number avoids disturbing real-time workloads,
but lengthens grace periods.

rcupdate.rcu_task_stall_timeout= [KNL]
Set timeout in jiffies for RCU task stall warning
messages. Disable with a value less than or equal
Expand Down
8 changes: 0 additions & 8 deletions Documentation/trace/ftrace-design.rst
Original file line number Diff line number Diff line change
Expand Up @@ -229,14 +229,6 @@ Adding support for it is easy: just define the macro in asm/ftrace.h and
pass the return address pointer as the 'retp' argument to
ftrace_push_return_trace().

HAVE_FTRACE_NMI_ENTER
---------------------

If you can't trace NMI functions, then skip this option.

<details to be filled>


HAVE_SYSCALL_TRACEPOINTS
------------------------

Expand Down
78 changes: 59 additions & 19 deletions arch/arm64/include/asm/hardirq.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,30 +32,70 @@ u64 smp_irq_stat_cpu(unsigned int cpu);

struct nmi_ctx {
u64 hcr;
unsigned int cnt;
};

DECLARE_PER_CPU(struct nmi_ctx, nmi_contexts);

#define arch_nmi_enter() \
do { \
if (is_kernel_in_hyp_mode()) { \
struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
nmi_ctx->hcr = read_sysreg(hcr_el2); \
if (!(nmi_ctx->hcr & HCR_TGE)) { \
write_sysreg(nmi_ctx->hcr | HCR_TGE, hcr_el2); \
isb(); \
} \
} \
} while (0)
#define arch_nmi_enter() \
do { \
struct nmi_ctx *___ctx; \
u64 ___hcr; \
\
if (!is_kernel_in_hyp_mode()) \
break; \
\
___ctx = this_cpu_ptr(&nmi_contexts); \
if (___ctx->cnt) { \
___ctx->cnt++; \
break; \
} \
\
___hcr = read_sysreg(hcr_el2); \
if (!(___hcr & HCR_TGE)) { \
write_sysreg(___hcr | HCR_TGE, hcr_el2); \
isb(); \
} \
/* \
* Make sure the sysreg write is performed before ___ctx->cnt \
* is set to 1. NMIs that see cnt == 1 will rely on us. \
*/ \
barrier(); \
___ctx->cnt = 1; \
/* \
* Make sure ___ctx->cnt is set before we save ___hcr. We \
* don't want ___ctx->hcr to be overwritten. \
*/ \
barrier(); \
___ctx->hcr = ___hcr; \
} while (0)

#define arch_nmi_exit() \
do { \
if (is_kernel_in_hyp_mode()) { \
struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
if (!(nmi_ctx->hcr & HCR_TGE)) \
write_sysreg(nmi_ctx->hcr, hcr_el2); \
} \
} while (0)
#define arch_nmi_exit() \
do { \
struct nmi_ctx *___ctx; \
u64 ___hcr; \
\
if (!is_kernel_in_hyp_mode()) \
break; \
\
___ctx = this_cpu_ptr(&nmi_contexts); \
___hcr = ___ctx->hcr; \
/* \
* Make sure we read ___ctx->hcr before we release \
* ___ctx->cnt as it makes ___ctx->hcr updatable again. \
*/ \
barrier(); \
___ctx->cnt--; \
/* \
* Make sure ___ctx->cnt release is visible before we \
* restore the sysreg. Otherwise a new NMI occurring \
* right after write_sysreg() can be fooled and think \
* we secured things for it. \
*/ \
barrier(); \
if (!___ctx->cnt && !(___hcr & HCR_TGE)) \
write_sysreg(___hcr, hcr_el2); \
} while (0)

static inline void ack_bad_irq(unsigned int irq)
{
Expand Down
14 changes: 2 additions & 12 deletions arch/arm64/kernel/sdei.c
Original file line number Diff line number Diff line change
Expand Up @@ -251,22 +251,12 @@ asmlinkage __kprobes notrace unsigned long
__sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
{
unsigned long ret;
bool do_nmi_exit = false;

/*
* nmi_enter() deals with printk() re-entrance and use of RCU when
* RCU believed this CPU was idle. Because critical events can
* interrupt normal events, we may already be in_nmi().
*/
if (!in_nmi()) {
nmi_enter();
do_nmi_exit = true;
}
nmi_enter();

ret = _sdei_handler(regs, arg);

if (do_nmi_exit)
nmi_exit();
nmi_exit();

return ret;
}
8 changes: 2 additions & 6 deletions arch/arm64/kernel/traps.c
Original file line number Diff line number Diff line change
Expand Up @@ -906,17 +906,13 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)

asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
{
const bool was_in_nmi = in_nmi();

if (!was_in_nmi)
nmi_enter();
nmi_enter();

/* non-RAS errors are not containable */
if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
arm64_serror_panic(regs, esr);

if (!was_in_nmi)
nmi_exit();
nmi_exit();
}

asmlinkage void enter_from_user_mode(void)
Expand Down
22 changes: 6 additions & 16 deletions arch/powerpc/kernel/traps.c
Original file line number Diff line number Diff line change
Expand Up @@ -441,15 +441,9 @@ void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
void system_reset_exception(struct pt_regs *regs)
{
unsigned long hsrr0, hsrr1;
bool nested = in_nmi();
bool saved_hsrrs = false;

/*
* Avoid crashes in case of nested NMI exceptions. Recoverability
* is determined by RI and in_nmi
*/
if (!nested)
nmi_enter();
nmi_enter();

/*
* System reset can interrupt code where HSRRs are live and MSR[RI]=1.
Expand Down Expand Up @@ -521,8 +515,7 @@ void system_reset_exception(struct pt_regs *regs)
mtspr(SPRN_HSRR1, hsrr1);
}

if (!nested)
nmi_exit();
nmi_exit();

/* What should we do here? We could issue a shutdown or hard reset. */
}
Expand Down Expand Up @@ -823,9 +816,8 @@ int machine_check_generic(struct pt_regs *regs)
void machine_check_exception(struct pt_regs *regs)
{
int recover = 0;
bool nested = in_nmi();
if (!nested)
nmi_enter();

nmi_enter();

__this_cpu_inc(irq_stat.mce_exceptions);

Expand All @@ -851,8 +843,7 @@ void machine_check_exception(struct pt_regs *regs)
if (check_io_access(regs))
goto bail;

if (!nested)
nmi_exit();
nmi_exit();

die("Machine check", regs, SIGBUS);

Expand All @@ -863,8 +854,7 @@ void machine_check_exception(struct pt_regs *regs)
return;

bail:
if (!nested)
nmi_exit();
nmi_exit();
}

void SMIException(struct pt_regs *regs)
Expand Down
1 change: 0 additions & 1 deletion arch/sh/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,6 @@ config SUPERH32
select HAVE_FUNCTION_TRACER
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
select ARCH_WANT_IPC_PARSE_VERSION
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_ARCH_KGDB
Expand Down
12 changes: 12 additions & 0 deletions arch/sh/kernel/traps.c
Original file line number Diff line number Diff line change
Expand Up @@ -170,11 +170,21 @@ BUILD_TRAP_HANDLER(bug)
force_sig(SIGTRAP);
}

#ifdef CONFIG_DYNAMIC_FTRACE
extern void arch_ftrace_nmi_enter(void);
extern void arch_ftrace_nmi_exit(void);
#else
static inline void arch_ftrace_nmi_enter(void) { }
static inline void arch_ftrace_nmi_exit(void) { }
#endif

BUILD_TRAP_HANDLER(nmi)
{
unsigned int cpu = smp_processor_id();
TRAP_HANDLER_DECL;

arch_ftrace_nmi_enter();

nmi_enter();
nmi_count(cpu)++;

Expand All @@ -190,4 +200,6 @@ BUILD_TRAP_HANDLER(nmi)
}

nmi_exit();

arch_ftrace_nmi_exit();
}
5 changes: 0 additions & 5 deletions arch/x86/include/asm/traps.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,6 @@ void smp_spurious_interrupt(struct pt_regs *regs);
void smp_error_interrupt(struct pt_regs *regs);
asmlinkage void smp_irq_move_cleanup_interrupt(void);

extern void ist_enter(struct pt_regs *regs);
extern void ist_exit(struct pt_regs *regs);
extern void ist_begin_non_atomic(struct pt_regs *regs);
extern void ist_end_non_atomic(void);

#ifdef CONFIG_VMAP_STACK
void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
Expand Down
Loading

0 comments on commit 2227e5b

Please sign in to comment.