Skip to content

Commit

Permalink
Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/paulmck/linux-rcu into core/rcu

Pull RCU updates from Paul E. McKenney:

"
 * Update RCU documentation.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/611.

 * Miscellaneous fixes.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/619.

 * Full-system idle detection.  This is for use by Frederic
   Weisbecker's adaptive-ticks mechanism.  Its purpose is
   to allow the timekeeping CPU to shut off its tick when
   all other CPUs are idle.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/648.

 * Improve rcutorture test coverage.  These were posted to LKML at
   https://lkml.org/lkml/2013/8/19/675.
"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
  • Loading branch information
Ingo Molnar committed Sep 3, 2013
2 parents 6e46645 + 25f27ce commit 7d992fe
Show file tree
Hide file tree
Showing 25 changed files with 1,610 additions and 833 deletions.
858 changes: 553 additions & 305 deletions Documentation/RCU/RTFP.txt

Large diffs are not rendered by default.

12 changes: 8 additions & 4 deletions Documentation/RCU/rcubarrier.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,14 @@ in realtime kernels in order to avoid excessive scheduling latencies.

rcu_barrier()

We instead need the rcu_barrier() primitive. This primitive is similar
to synchronize_rcu(), but instead of waiting solely for a grace
period to elapse, it also waits for all outstanding RCU callbacks to
complete. Pseudo-code using rcu_barrier() is as follows:
We instead need the rcu_barrier() primitive. Rather than waiting for
a grace period to elapse, rcu_barrier() waits for all outstanding RCU
callbacks to complete. Please note that rcu_barrier() does -not- imply
synchronize_rcu(), in particular, if there are no RCU callbacks queued
anywhere, rcu_barrier() is within its rights to return immediately,
without waiting for a grace period to elapse.

Pseudo-code using rcu_barrier() is as follows:

1. Prevent any new RCU callbacks from being posted.
2. Execute rcu_barrier().
Expand Down
10 changes: 10 additions & 0 deletions Documentation/RCU/torture.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,16 @@ fqs_holdoff Holdoff time (in microseconds) between consecutive calls
fqs_stutter Wait time (in seconds) between consecutive bursts
of calls to force_quiescent_state().

gp_normal Make the fake writers use normal synchronous grace-period
primitives.

gp_exp Make the fake writers use expedited synchronous grace-period
primitives. If both gp_normal and gp_exp are set, or
if neither gp_normal nor gp_exp are set, then randomly
choose the primitive so that about 50% are normal and
50% expedited. By default, neither are set, which
gives best overall test coverage.

irqreader Says to invoke RCU readers from irq level. This is currently
done via timers. Defaults to "1" for variants of RCU that
permit this. (Or, more accurately, variants of RCU that do
Expand Down
10 changes: 6 additions & 4 deletions Documentation/memory-barriers.txt
Original file line number Diff line number Diff line change
Expand Up @@ -531,9 +531,10 @@ dependency barrier to make it work correctly. Consider the following bit of
code:

q = &a;
if (p)
if (p) {
<data dependency barrier>
q = &b;
<data dependency barrier>
}
x = *q;

This will not have the desired effect because there is no actual data
Expand All @@ -542,9 +543,10 @@ attempting to predict the outcome in advance. In such a case what's actually
required is:

q = &a;
if (p)
if (p) {
<read barrier>
q = &b;
<read barrier>
}
x = *q;


Expand Down
44 changes: 34 additions & 10 deletions Documentation/timers/NO_HZ.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ There are three main ways of managing scheduling-clock interrupts
workloads, you will normally -not- want this option.

These three cases are described in the following three sections, followed
by a third section on RCU-specific considerations and a fourth and final
section listing known issues.
by a third section on RCU-specific considerations, a fourth section
discussing testing, and a fifth and final section listing known issues.


NEVER OMIT SCHEDULING-CLOCK TICKS
Expand Down Expand Up @@ -121,14 +121,15 @@ boot parameter specifies the adaptive-ticks CPUs. For example,
"nohz_full=1,6-8" says that CPUs 1, 6, 7, and 8 are to be adaptive-ticks
CPUs. Note that you are prohibited from marking all of the CPUs as
adaptive-tick CPUs: At least one non-adaptive-tick CPU must remain
online to handle timekeeping tasks in order to ensure that system calls
like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no
running user processes to observe slight drifts in clock rate.)
Therefore, the boot CPU is prohibited from entering adaptive-ticks
mode. Specifying a "nohz_full=" mask that includes the boot CPU will
result in a boot-time error message, and the boot CPU will be removed
from the mask.
online to handle timekeeping tasks in order to ensure that system
calls like gettimeofday() returns accurate values on adaptive-tick CPUs.
(This is not an issue for CONFIG_NO_HZ_IDLE=y because there are no running
user processes to observe slight drifts in clock rate.) Therefore, the
boot CPU is prohibited from entering adaptive-ticks mode. Specifying a
"nohz_full=" mask that includes the boot CPU will result in a boot-time
error message, and the boot CPU will be removed from the mask. Note that
this means that your system must have at least two CPUs in order for
CONFIG_NO_HZ_FULL=y to do anything for you.

Alternatively, the CONFIG_NO_HZ_FULL_ALL=y Kconfig parameter specifies
that all CPUs other than the boot CPU are adaptive-ticks CPUs. This
Expand Down Expand Up @@ -232,6 +233,29 @@ scheduler will decide where to run them, which might or might not be
where you want them to run.


TESTING

So you enable all the OS-jitter features described in this document,
but do not see any change in your workload's behavior. Is this because
your workload isn't affected that much by OS jitter, or is it because
something else is in the way? This section helps answer this question
by providing a simple OS-jitter test suite, which is available on branch
master of the following git archive:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git

Clone this archive and follow the instructions in the README file.
This test procedure will produce a trace that will allow you to evaluate
whether or not you have succeeded in removing OS jitter from your system.
If this trace shows that you have removed OS jitter as much as is
possible, then you can conclude that your workload is not all that
sensitive to OS jitter.

Note: this test requires that your system have at least two CPUs.
We do not currently have a good way to remove OS jitter from single-CPU
systems.


KNOWN ISSUES

o Dyntick-idle slows transitions to and from idle slightly.
Expand Down
7 changes: 6 additions & 1 deletion include/asm-generic/vmlinux.lds.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,12 @@
#define TRACE_PRINTKS() VMLINUX_SYMBOL(__start___trace_bprintk_fmt) = .; \
*(__trace_printk_fmt) /* Trace_printk fmt' pointer */ \
VMLINUX_SYMBOL(__stop___trace_bprintk_fmt) = .;
#define TRACEPOINT_STR() VMLINUX_SYMBOL(__start___tracepoint_str) = .; \
*(__tracepoint_str) /* Trace_printk fmt' pointer */ \
VMLINUX_SYMBOL(__stop___tracepoint_str) = .;
#else
#define TRACE_PRINTKS()
#define TRACEPOINT_STR()
#endif

#ifdef CONFIG_FTRACE_SYSCALLS
Expand Down Expand Up @@ -190,7 +194,8 @@
VMLINUX_SYMBOL(__stop___verbose) = .; \
LIKELY_PROFILE() \
BRANCH_PROFILE() \
TRACE_PRINTKS()
TRACE_PRINTKS() \
TRACEPOINT_STR()

/*
* Data section helpers
Expand Down
6 changes: 3 additions & 3 deletions include/linux/debugobjects.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ struct debug_obj_descr {
extern void debug_object_init (void *addr, struct debug_obj_descr *descr);
extern void
debug_object_init_on_stack(void *addr, struct debug_obj_descr *descr);
extern void debug_object_activate (void *addr, struct debug_obj_descr *descr);
extern int debug_object_activate (void *addr, struct debug_obj_descr *descr);
extern void debug_object_deactivate(void *addr, struct debug_obj_descr *descr);
extern void debug_object_destroy (void *addr, struct debug_obj_descr *descr);
extern void debug_object_free (void *addr, struct debug_obj_descr *descr);
Expand All @@ -85,8 +85,8 @@ static inline void
debug_object_init (void *addr, struct debug_obj_descr *descr) { }
static inline void
debug_object_init_on_stack(void *addr, struct debug_obj_descr *descr) { }
static inline void
debug_object_activate (void *addr, struct debug_obj_descr *descr) { }
static inline int
debug_object_activate (void *addr, struct debug_obj_descr *descr) { return 0; }
static inline void
debug_object_deactivate(void *addr, struct debug_obj_descr *descr) { }
static inline void
Expand Down
34 changes: 34 additions & 0 deletions include/linux/ftrace_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,40 @@ do { \
__trace_printk(ip, fmt, ##args); \
} while (0)

/**
* tracepoint_string - register constant persistent string to trace system
* @str - a constant persistent string that will be referenced in tracepoints
*
* If constant strings are being used in tracepoints, it is faster and
* more efficient to just save the pointer to the string and reference
* that with a printf "%s" instead of saving the string in the ring buffer
* and wasting space and time.
*
* The problem with the above approach is that userspace tools that read
* the binary output of the trace buffers do not have access to the string.
* Instead they just show the address of the string which is not very
* useful to users.
*
* With tracepoint_string(), the string will be registered to the tracing
* system and exported to userspace via the debugfs/tracing/printk_formats
* file that maps the string address to the string text. This way userspace
* tools that read the binary buffers have a way to map the pointers to
* the ASCII strings they represent.
*
* The @str used must be a constant string and persistent as it would not
* make sense to show a string that no longer exists. But it is still fine
* to be used with modules, because when modules are unloaded, if they
* had tracepoints, the ring buffers are cleared too. As long as the string
* does not change during the life of the module, it is fine to use
* tracepoint_string() within a module.
*/
#define tracepoint_string(str) \
({ \
static const char *___tp_str __tracepoint_string = str; \
___tp_str; \
})
#define __tracepoint_string __attribute__((section("__tracepoint_str")))

#ifdef CONFIG_PERF_EVENTS
struct perf_event;

Expand Down
8 changes: 4 additions & 4 deletions include/linux/jiffies.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,13 +101,13 @@ static inline u64 get_jiffies_64(void)
#define time_after(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(b) - (long)(a) < 0))
((long)((b) - (a)) < 0))
#define time_before(a,b) time_after(b,a)

#define time_after_eq(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(a) - (long)(b) >= 0))
((long)((a) - (b)) >= 0))
#define time_before_eq(a,b) time_after_eq(b,a)

/*
Expand All @@ -130,13 +130,13 @@ static inline u64 get_jiffies_64(void)
#define time_after64(a,b) \
(typecheck(__u64, a) && \
typecheck(__u64, b) && \
((__s64)(b) - (__s64)(a) < 0))
((__s64)((b) - (a)) < 0))
#define time_before64(a,b) time_after64(b,a)

#define time_after_eq64(a,b) \
(typecheck(__u64, a) && \
typecheck(__u64, b) && \
((__s64)(a) - (__s64)(b) >= 0))
((__s64)((a) - (b)) >= 0))
#define time_before_eq64(a,b) time_after_eq64(b,a)

#define time_in_range64(a, b, c) \
Expand Down
5 changes: 3 additions & 2 deletions include/linux/rculist.h
Original file line number Diff line number Diff line change
Expand Up @@ -267,8 +267,9 @@ static inline void list_splice_init_rcu(struct list_head *list,
*/
#define list_first_or_null_rcu(ptr, type, member) \
({struct list_head *__ptr = (ptr); \
struct list_head __rcu *__next = list_next_rcu(__ptr); \
likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
struct list_head *__next = ACCESS_ONCE(__ptr->next); \
likely(__ptr != __next) ? \
list_entry_rcu(__next, type, member) : NULL; \
})

/**
Expand Down
26 changes: 20 additions & 6 deletions include/linux/rcupdate.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ extern int rcutorture_runnable; /* for sysctl */
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
extern void rcutorture_record_test_transition(void);
extern void rcutorture_record_progress(unsigned long vernum);
extern void do_trace_rcu_torture_read(char *rcutorturename,
extern void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
Expand All @@ -65,7 +65,7 @@ static inline void rcutorture_record_progress(unsigned long vernum)
{
}
#ifdef CONFIG_RCU_TRACE
extern void do_trace_rcu_torture_read(char *rcutorturename,
extern void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
Expand Down Expand Up @@ -229,13 +229,9 @@ extern void rcu_irq_exit(void);
#ifdef CONFIG_RCU_USER_QS
extern void rcu_user_enter(void);
extern void rcu_user_exit(void);
extern void rcu_user_enter_after_irq(void);
extern void rcu_user_exit_after_irq(void);
#else
static inline void rcu_user_enter(void) { }
static inline void rcu_user_exit(void) { }
static inline void rcu_user_enter_after_irq(void) { }
static inline void rcu_user_exit_after_irq(void) { }
static inline void rcu_user_hooks_switch(struct task_struct *prev,
struct task_struct *next) { }
#endif /* CONFIG_RCU_USER_QS */
Expand Down Expand Up @@ -1015,4 +1011,22 @@ static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */


/* Only for use by adaptive-ticks code. */
#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
extern bool rcu_sys_is_idle(void);
extern void rcu_sysidle_force_exit(void);
#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */

static inline bool rcu_sys_is_idle(void)
{
return false;
}

static inline void rcu_sysidle_force_exit(void)
{
}

#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */


#endif /* __LINUX_RCUPDATE_H */
Loading

0 comments on commit 7d992fe

Please sign in to comment.