Skip to content

Commit

Permalink
Merge branch 'core-rcu.2021.08.28a' of git://git.kernel.org/pub/scm/l…
Browse files Browse the repository at this point in the history
…inux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:
 "RCU changes for this cycle were:

   - Documentation updates

   - Miscellaneous fixes

   - Offloaded-callbacks updates

   - Updates to the nolibc library

   - Tasks-RCU updates

   - In-kernel torture-test updates

   - Torture-test scripting, perhaps most notably the pinning of
     torture-test guest OSes so as to force differences in memory
     latency. For example, in a two-socket system, a four-CPU guest OS
     will have one pair of its CPUs pinned to threads in a single core
     on one socket and the other pair pinned to threads in a single core
     on the other socket. This approach proved able to force race
     conditions that earlier testing missed. Some of these race
     conditions are still being tracked down"

* 'core-rcu.2021.08.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (61 commits)
  torture: Replace deprecated CPU-hotplug functions.
  rcu: Replace deprecated CPU-hotplug functions
  rcu: Print human-readable message for schedule() in RCU reader
  rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCU
  rcu: Use per_cpu_ptr to get the pointer of per_cpu variable
  rcu: Remove useless "ret" update in rcu_gp_fqs_loop()
  rcu: Mark accesses in tree_stall.h
  rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack
  rcu: Mark lockless ->qsmask read in rcu_check_boost_fail()
  srcutiny: Mark read-side data races
  rcu: Start timing stall repetitions after warning complete
  rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()
  rcu/tree: Handle VM stoppage in stall detection
  rculist: Unify documentation about missing list_empty_rcu()
  rcu: Mark accesses to ->rcu_read_lock_nesting
  rcu: Weaken ->dynticks accesses and updates
  rcu: Remove special bit at the bottom of the ->dynticks counter
  rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
  rcu: Fix to include first blocked task in stall warning
  torture: Make kvm-test-1-run-qemu.sh check for reboot loops
  ...
  • Loading branch information
Linus Torvalds committed Aug 30, 2021
2 parents 6f01c93 + b770efc commit 4ca4256
Show file tree
Hide file tree
Showing 41 changed files with 2,241 additions and 1,773 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,35 @@ on PowerPC.
The ``smp_mb__after_unlock_lock()`` invocations prevent this
``WARN_ON()`` from triggering.

+-----------------------------------------------------------------------+
| **Quick Quiz**: |
+-----------------------------------------------------------------------+
| But the chain of rcu_node-structure lock acquisitions guarantees |
| that new readers will see all of the updater's pre-grace-period |
| accesses and also guarantees that the updater's post-grace-period |
| accesses will see all of the old reader's accesses. So why do we |
| need all of those calls to smp_mb__after_unlock_lock()? |
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| Because we must provide ordering for RCU's polling grace-period |
| primitives, for example, get_state_synchronize_rcu() and |
| poll_state_synchronize_rcu(). Consider this code:: |
| |
| CPU 0 CPU 1 |
| ---- ---- |
| WRITE_ONCE(X, 1) WRITE_ONCE(Y, 1) |
| g = get_state_synchronize_rcu() smp_mb() |
| while (!poll_state_synchronize_rcu(g)) r1 = READ_ONCE(X) |
| continue; |
| r0 = READ_ONCE(Y) |
| |
| RCU guarantees that the outcome r0 == 0 && r1 == 0 will not |
| happen, even if CPU 1 is in an RCU extended quiescent state |
| (idle or offline) and thus won't interact directly with the RCU |
| core processing at all. |
+-----------------------------------------------------------------------+

This approach must be extended to include idle CPUs, which need
RCU's grace-period memory ordering guarantee to extend to any
RCU read-side critical sections preceding and following the current
Expand Down
8 changes: 5 additions & 3 deletions Documentation/RCU/Design/Requirements/Requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -362,9 +362,8 @@ do_something_gp() uses rcu_dereference() to fetch from ``gp``:
12 }

The rcu_dereference() uses volatile casts and (for DEC Alpha) memory
barriers in the Linux kernel. Should a `high-quality implementation of
C11 ``memory_order_consume``
[PDF] <http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf>`__
barriers in the Linux kernel. Should a |high-quality implementation of
C11 memory_order_consume [PDF]|_
ever appear, then rcu_dereference() could be implemented as a
``memory_order_consume`` load. Regardless of the exact implementation, a
pointer fetched by rcu_dereference() may not be used outside of the
Expand All @@ -374,6 +373,9 @@ element has been passed from RCU to some other synchronization
mechanism, most commonly locking or `reference
counting <https://www.kernel.org/doc/Documentation/RCU/rcuref.txt>`__.

.. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF]
.. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf

In short, updaters use rcu_assign_pointer() and readers use
rcu_dereference(), and these two RCU API elements work together to
ensure that readers have a consistent view of newly added data elements.
Expand Down
24 changes: 12 additions & 12 deletions Documentation/RCU/checklist.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ over a rather long period of time, but improvements are always welcome!

1. Does the update code have proper mutual exclusion?

RCU does allow -readers- to run (almost) naked, but -writers- must
RCU does allow *readers* to run (almost) naked, but *writers* must
still use some sort of mutual exclusion, such as:

a. locking,
Expand Down Expand Up @@ -73,7 +73,7 @@ over a rather long period of time, but improvements are always welcome!
critical section is every bit as bad as letting them leak out
from under a lock. Unless, of course, you have arranged some
other means of protection, such as a lock or a reference count
-before- letting them out of the RCU read-side critical section.
*before* letting them out of the RCU read-side critical section.

3. Does the update code tolerate concurrent accesses?

Expand Down Expand Up @@ -101,7 +101,7 @@ over a rather long period of time, but improvements are always welcome!
c. Make updates appear atomic to readers. For example,
pointer updates to properly aligned fields will
appear atomic, as will individual atomic primitives.
Sequences of operations performed under a lock will -not-
Sequences of operations performed under a lock will *not*
appear to be atomic to RCU readers, nor will sequences
of multiple atomic primitives.

Expand Down Expand Up @@ -333,7 +333,7 @@ over a rather long period of time, but improvements are always welcome!
for example) may be omitted.

10. Conversely, if you are in an RCU read-side critical section,
and you don't hold the appropriate update-side lock, you -must-
and you don't hold the appropriate update-side lock, you *must*
use the "_rcu()" variants of the list macros. Failing to do so
will break Alpha, cause aggressive compilers to generate bad code,
and confuse people trying to read your code.
Expand All @@ -359,12 +359,12 @@ over a rather long period of time, but improvements are always welcome!
callback pending, then that RCU callback will execute on some
surviving CPU. (If this was not the case, a self-spawning RCU
callback would prevent the victim CPU from ever going offline.)
Furthermore, CPUs designated by rcu_nocbs= might well -always-
Furthermore, CPUs designated by rcu_nocbs= might well *always*
have their RCU callbacks executed on some other CPUs, in fact,
for some real-time workloads, this is the whole point of using
the rcu_nocbs= kernel boot parameter.

13. Unlike other forms of RCU, it -is- permissible to block in an
13. Unlike other forms of RCU, it *is* permissible to block in an
SRCU read-side critical section (demarked by srcu_read_lock()
and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
Please note that if you don't need to sleep in read-side critical
Expand Down Expand Up @@ -411,16 +411,16 @@ over a rather long period of time, but improvements are always welcome!
14. The whole point of call_rcu(), synchronize_rcu(), and friends
is to wait until all pre-existing readers have finished before
carrying out some otherwise-destructive operation. It is
therefore critically important to -first- remove any path
therefore critically important to *first* remove any path
that readers can follow that could be affected by the
destructive operation, and -only- -then- invoke call_rcu(),
destructive operation, and *only then* invoke call_rcu(),
synchronize_rcu(), or friends.

Because these primitives only wait for pre-existing readers, it
is the caller's responsibility to guarantee that any subsequent
readers will execute safely.

15. The various RCU read-side primitives do -not- necessarily contain
15. The various RCU read-side primitives do *not* necessarily contain
memory barriers. You should therefore plan for the CPU
and the compiler to freely reorder code into and out of RCU
read-side critical sections. It is the responsibility of the
Expand Down Expand Up @@ -459,8 +459,8 @@ over a rather long period of time, but improvements are always welcome!
pass in a function defined within a loadable module, then it in
necessary to wait for all pending callbacks to be invoked after
the last invocation and before unloading that module. Note that
it is absolutely -not- sufficient to wait for a grace period!
The current (say) synchronize_rcu() implementation is -not-
it is absolutely *not* sufficient to wait for a grace period!
The current (say) synchronize_rcu() implementation is *not*
guaranteed to wait for callbacks registered on other CPUs.
Or even on the current CPU if that CPU recently went offline
and came back online.
Expand All @@ -470,7 +470,7 @@ over a rather long period of time, but improvements are always welcome!
- call_rcu() -> rcu_barrier()
- call_srcu() -> srcu_barrier()

However, these barrier functions are absolutely -not- guaranteed
However, these barrier functions are absolutely *not* guaranteed
to wait for a grace period. In fact, if there are no call_rcu()
callbacks waiting anywhere in the system, rcu_barrier() is within
its rights to return immediately.
Expand Down
6 changes: 3 additions & 3 deletions Documentation/RCU/rcu_dereference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Follow these rules to keep your RCU code working properly:
- Set bits and clear bits down in the must-be-zero low-order
bits of that pointer. This clearly means that the pointer
must have alignment constraints, for example, this does
-not- work in general for char* pointers.
*not* work in general for char* pointers.

- XOR bits to translate pointers, as is done in some
classic buddy-allocator algorithms.
Expand Down Expand Up @@ -174,7 +174,7 @@ Follow these rules to keep your RCU code working properly:
Please see the "CONTROL DEPENDENCIES" section of
Documentation/memory-barriers.txt for more details.

- The pointers are not equal -and- the compiler does
- The pointers are not equal *and* the compiler does
not have enough information to deduce the value of the
pointer. Note that the volatile cast in rcu_dereference()
will normally prevent the compiler from knowing too much.
Expand Down Expand Up @@ -360,7 +360,7 @@ in turn destroying the ordering between this load and the loads of the
return values. This can result in "p->b" returning pre-initialization
garbage values.

In short, rcu_dereference() is -not- optional when you are going to
In short, rcu_dereference() is *not* optional when you are going to
dereference the resulting pointer.


Expand Down
31 changes: 20 additions & 11 deletions Documentation/RCU/stallwarn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ warnings:

- Booting Linux using a console connection that is too slow to
keep up with the boot-time console-message rate. For example,
a 115Kbaud serial console can be -way- too slow to keep up
a 115Kbaud serial console can be *way* too slow to keep up
with boot-time message rates, and will frequently result in
RCU CPU stall warning messages. Especially if you have added
debug printk()s.
Expand Down Expand Up @@ -105,7 +105,7 @@ warnings:
leading the realization that the CPU had failed.

The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning.
Note that SRCU does -not- have CPU stall warnings. Please note that
Note that SRCU does *not* have CPU stall warnings. Please note that
RCU only detects CPU stalls when there is a grace period in progress.
No grace period, no CPU stall warnings.

Expand Down Expand Up @@ -145,7 +145,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
this parameter is checked only at the beginning of a cycle.
So if you are 10 seconds into a 40-second stall, setting this
sysfs parameter to (say) five will shorten the timeout for the
-next- stall, or the following warning for the current stall
*next* stall, or the following warning for the current stall
(assuming the stall lasts long enough). It will not affect the
timing of the next warning for the current stall.

Expand Down Expand Up @@ -189,8 +189,8 @@ rcupdate.rcu_task_stall_timeout
Interpreting RCU's CPU Stall-Detector "Splats"
==============================================

For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
it will print a message similar to the following::
For non-RCU-tasks flavors of RCU, when a CPU detects that some other
CPU is stalling, it will print a message similar to the following::

INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
Expand All @@ -202,8 +202,10 @@ causing stalls, and that the stall was affecting RCU-sched. This message
will normally be followed by stack dumps for each CPU. Please note that
PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
the tasks will be indicated by PID, for example, "P3421". It is even
possible for an rcu_state stall to be caused by both CPUs -and- tasks,
possible for an rcu_state stall to be caused by both CPUs *and* tasks,
in which case the offending CPUs and tasks will all be called out in the list.
In some cases, CPUs will detect themselves stalling, which will result
in a self-detected stall.

CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
the RCU core for the past three grace periods. In contrast, CPU 16's "(0
Expand All @@ -224,7 +226,7 @@ is the number that had executed since boot at the time that this CPU
last noted the beginning of a grace period, which might be the current
(stalled) grace period, or it might be some earlier grace period (for
example, if the CPU might have been in dyntick-idle mode for an extended
time period. The number after the "/" is the number that have executed
time period). The number after the "/" is the number that have executed
since boot until the current time. If this latter number stays constant
across repeated stall-warning messages, it is possible that RCU's softirq
handlers are no longer able to execute on this CPU. This can happen if
Expand Down Expand Up @@ -283,7 +285,8 @@ If the relevant grace-period kthread has been unable to run prior to
the stall warning, as was the case in the "All QSes seen" line above,
the following additional line is printed::

kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
rcu_sched kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.

Starving the grace-period kthreads of CPU time can of course result
in RCU CPU stall warnings even when all CPUs and tasks have passed
Expand Down Expand Up @@ -313,15 +316,21 @@ is the current ``TIMER_SOFTIRQ`` count on cpu 4. If this value does not
change on successive RCU CPU stall warnings, there is further reason to
suspect a timer problem.

These messages are usually followed by stack dumps of the CPUs and tasks
involved in the stall. These stack traces can help you locate the cause
of the stall, keeping in mind that the CPU detecting the stall will have
an interrupt frame that is mainly devoted to detecting the stall.


Multiple Warnings From One Stall
================================

If a stall lasts long enough, multiple stall-warning messages will be
printed for it. The second and subsequent messages are printed at
If a stall lasts long enough, multiple stall-warning messages will
be printed for it. The second and subsequent messages are printed at
longer intervals, so that the time between (say) the first and second
message will be about three times the interval between the beginning
of the stall and the first message.
of the stall and the first message. It can be helpful to compare the
stack dumps for the different messages for the same stalled grace period.


Stall Warnings for Expedited Grace Periods
Expand Down
35 changes: 17 additions & 18 deletions include/linux/rculist.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,6 @@
#include <linux/list.h>
#include <linux/rcupdate.h>

/*
* Why is there no list_empty_rcu()? Because list_empty() serves this
* purpose. The list_empty() function fetches the RCU-protected pointer
* and compares it to the address of the list head, but neither dereferences
* this pointer itself nor provides this pointer to the caller. Therefore,
* it is not necessary to use rcu_dereference(), so that list_empty() can
* be used anywhere you would want to use a list_empty_rcu().
*/

/*
* INIT_LIST_HEAD_RCU - Initialize a list_head visible to RCU readers
* @list: list to be initialized
Expand Down Expand Up @@ -318,21 +309,29 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
/*
* Where are list_empty_rcu() and list_first_entry_rcu()?
*
* Implementing those functions following their counterparts list_empty() and
* list_first_entry() is not advisable because they lead to subtle race
* conditions as the following snippet shows:
* They do not exist because they would lead to subtle race conditions:
*
* if (!list_empty_rcu(mylist)) {
* struct foo *bar = list_first_entry_rcu(mylist, struct foo, list_member);
* do_something(bar);
* }
*
* The list may not be empty when list_empty_rcu checks it, but it may be when
* list_first_entry_rcu rereads the ->next pointer.
*
* Rereading the ->next pointer is not a problem for list_empty() and
* list_first_entry() because they would be protected by a lock that blocks
* writers.
* The list might be non-empty when list_empty_rcu() checks it, but it
* might have become empty by the time that list_first_entry_rcu() rereads
* the ->next pointer, which would result in a SEGV.
*
* When not using RCU, it is OK for list_first_entry() to re-read that
* pointer because both functions should be protected by some lock that
* blocks writers.
*
* When using RCU, list_empty() uses READ_ONCE() to fetch the
* RCU-protected ->next pointer and then compares it to the address of the
* list head. However, it neither dereferences this pointer nor provides
* this pointer to its caller. Thus, READ_ONCE() suffices (that is,
* rcu_dereference() is not needed), which means that list_empty() can be
* used anywhere you would want to use list_empty_rcu(). Just don't
* expect anything useful to happen if you do a subsequent lockless
* call to list_first_entry_rcu()!!!
*
* See list_first_or_null_rcu for an alternative.
*/
Expand Down
4 changes: 2 additions & 2 deletions include/linux/rcupdate.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ void __rcu_read_unlock(void);
* nesting depth, but makes sense only if CONFIG_PREEMPT_RCU -- in other
* types of kernel builds, the rcu_read_lock() nesting depth is unknowable.
*/
#define rcu_preempt_depth() (current->rcu_read_lock_nesting)
#define rcu_preempt_depth() READ_ONCE(current->rcu_read_lock_nesting)

#else /* #ifdef CONFIG_PREEMPT_RCU */

Expand Down Expand Up @@ -167,7 +167,7 @@ void synchronize_rcu_tasks(void);
# define synchronize_rcu_tasks synchronize_rcu
# endif

# ifdef CONFIG_TASKS_RCU_TRACE
# ifdef CONFIG_TASKS_TRACE_RCU
# define rcu_tasks_trace_qs(t) \
do { \
if (!likely(READ_ONCE((t)->trc_reader_checked)) && \
Expand Down
3 changes: 0 additions & 3 deletions include/linux/rcutiny.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@

#include <asm/param.h> /* for HZ */

/* Never flag non-existent other CPUs! */
static inline bool rcu_eqs_special_set(int cpu) { return false; }

unsigned long get_state_synchronize_rcu(void);
unsigned long start_poll_synchronize_rcu(void);
bool poll_state_synchronize_rcu(unsigned long oldstate);
Expand Down
8 changes: 4 additions & 4 deletions include/linux/srcutiny.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ static inline int __srcu_read_lock(struct srcu_struct *ssp)
int idx;

idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
WRITE_ONCE(ssp->srcu_lock_nesting[idx], ssp->srcu_lock_nesting[idx] + 1);
WRITE_ONCE(ssp->srcu_lock_nesting[idx], READ_ONCE(ssp->srcu_lock_nesting[idx]) + 1);
return idx;
}

Expand All @@ -81,11 +81,11 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
{
int idx;

idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
idx = ((data_race(READ_ONCE(ssp->srcu_idx)) + 1) & 0x2) >> 1;
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n",
tt, tf, idx,
READ_ONCE(ssp->srcu_lock_nesting[!idx]),
READ_ONCE(ssp->srcu_lock_nesting[idx]));
data_race(READ_ONCE(ssp->srcu_lock_nesting[!idx])),
data_race(READ_ONCE(ssp->srcu_lock_nesting[idx])));
}

#endif
Loading

0 comments on commit 4ca4256

Please sign in to comment.