Skip to content

Commit

Permalink
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The biggest RCU changes in this cycle were:

   - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

   - Replace calls of RCU-bh and RCU-sched update-side functions to
     their vanilla RCU counterparts. This series is a step towards
     complete removal of the RCU-bh and RCU-sched update-side functions.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - Documentation updates, including a number of flavor-consolidation
     updates from Joel Fernandes.

   - Miscellaneous fixes.

   - Automate generation of the initrd filesystem used for rcutorture
     testing.

   - Convert spin_is_locked() assertions to instead use lockdep.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - SRCU updates, especially including a fix from Dennis Krein for a
     bag-on-head-class bug.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
  rcutorture: Don't do busted forward-progress testing
  rcutorture: Use 100ms buckets for forward-progress callback histograms
  rcutorture: Recover from OOM during forward-progress tests
  rcutorture: Print forward-progress test age upon failure
  rcutorture: Print time since GP end upon forward-progress failure
  rcutorture: Print histogram of CB invocation at OOM time
  rcutorture: Print GP age upon forward-progress failure
  rcu: Print per-CPU callback counts for forward-progress failures
  rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
  rcutorture: Dump grace-period diagnostics upon forward-progress OOM
  rcutorture: Prepare for asynchronous access to rcu_fwd_startat
  torture: Remove unnecessary "ret" variables
  rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
  rcutorture: Break up too-long rcu_torture_fwd_prog() function
  rcutorture: Remove cbflood facility
  torture: Bring any extra CPUs online during kernel startup
  rcutorture: Add call_rcu() flooding forward-progress tests
  rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
  tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
  net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
  ...
  • Loading branch information
torvalds committed Dec 26, 2018
2 parents eed9688 + 4bbfd74 commit 792bf4d
Show file tree
Hide file tree
Showing 88 changed files with 4,187 additions and 4,014 deletions.
499 changes: 0 additions & 499 deletions Documentation/RCU/Design/Data-Structures/BigTreeClassicRCUBH.svg

This file was deleted.

This file was deleted.

This file was deleted.

834 changes: 319 additions & 515 deletions Documentation/RCU/Design/Data-Structures/BigTreePreemptRCUBHdyntickCB.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
173 changes: 55 additions & 118 deletions Documentation/RCU/Design/Data-Structures/Data-Structures.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@ <h3>Introduction</h3>
The <tt>rcu_segcblist</tt> Structure</a>
<li> <a href="#The rcu_data Structure">
The <tt>rcu_data</tt> Structure</a>
<li> <a href="#The rcu_dynticks Structure">
The <tt>rcu_dynticks</tt> Structure</a>
<li> <a href="#The rcu_head Structure">
The <tt>rcu_head</tt> Structure</a>
<li> <a href="#RCU-Specific Fields in the task_struct Structure">
Expand Down Expand Up @@ -127,9 +125,11 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
</p><p>RCU currently permits up to a four-level tree, which on a 64-bit system
accommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for
32-bit systems.
On the other hand, you can set <tt>CONFIG_RCU_FANOUT</tt> to be
as small as 2 if you wish, which would permit only 16 CPUs, which
is useful for testing.
On the other hand, you can set both <tt>CONFIG_RCU_FANOUT</tt> and
<tt>CONFIG_RCU_FANOUT_LEAF</tt> to be as small as 2, which would result
in a 16-CPU test using a 4-level tree.
This can be useful for testing large-system capabilities on small test
machines.

</p><p>This multi-level combining tree allows us to get most of the
performance and scalability
Expand All @@ -154,44 +154,9 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
keeping lock contention under control at all tree levels regardless
of the level of loading on the system.

</p><p>The Linux kernel actually supports multiple flavors of RCU
running concurrently, so RCU builds separate data structures for each
flavor.
For example, for <tt>CONFIG_TREE_RCU=y</tt> kernels, RCU provides
rcu_sched and rcu_bh, as shown below:

</p><p><img src="BigTreeClassicRCUBH.svg" alt="BigTreeClassicRCUBH.svg" width="33%">

</p><p>Energy efficiency is increasingly important, and for that
reason the Linux kernel provides <tt>CONFIG_NO_HZ_IDLE</tt>, which
turns off the scheduling-clock interrupts on idle CPUs, which in
turn allows those CPUs to attain deeper sleep states and to consume
less energy.
CPUs whose scheduling-clock interrupts have been turned off are
said to be in <i>dyntick-idle mode</i>.
RCU must handle dyntick-idle CPUs specially
because RCU would otherwise wake up each CPU on every grace period,
which would defeat the whole purpose of <tt>CONFIG_NO_HZ_IDLE</tt>.
RCU uses the <tt>rcu_dynticks</tt> structure to track
which CPUs are in dyntick idle mode, as shown below:

</p><p><img src="BigTreeClassicRCUBHdyntick.svg" alt="BigTreeClassicRCUBHdyntick.svg" width="33%">

</p><p>However, if a CPU is in dyntick-idle mode, it is in that mode
for all flavors of RCU.
Therefore, a single <tt>rcu_dynticks</tt> structure is allocated per
CPU, and all of a given CPU's <tt>rcu_data</tt> structures share
that <tt>rcu_dynticks</tt>, as shown in the figure.

</p><p>Kernels built with <tt>CONFIG_PREEMPT_RCU</tt> support
rcu_preempt in addition to rcu_sched and rcu_bh, as shown below:

</p><p><img src="BigTreePreemptRCUBHdyntick.svg" alt="BigTreePreemptRCUBHdyntick.svg" width="35%">

</p><p>RCU updaters wait for normal grace periods by registering
RCU callbacks, either directly via <tt>call_rcu()</tt> and
friends (namely <tt>call_rcu_bh()</tt> and <tt>call_rcu_sched()</tt>),
there being a separate interface per flavor of RCU)
or indirectly via <tt>synchronize_rcu()</tt> and friends.
RCU callbacks are represented by <tt>rcu_head</tt> structures,
which are queued on <tt>rcu_data</tt> structures while they are
Expand All @@ -214,9 +179,6 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
<li> Each <tt>rcu_node</tt> structure has a spinlock.
<li> The fields in <tt>rcu_data</tt> are private to the corresponding
CPU, although a few can be read and written by other CPUs.
<li> Similarly, the fields in <tt>rcu_dynticks</tt> are private
to the corresponding CPU, although a few can be read by
other CPUs.
</ol>

<p>It is important to note that different data structures can have
Expand Down Expand Up @@ -272,11 +234,6 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
access to this information from the corresponding CPU.
Finally, this structure records past dyntick-idle state
for the corresponding CPU and also tracks statistics.
<li> <tt>rcu_dynticks</tt>:
This per-CPU structure tracks the current dyntick-idle
state for the corresponding CPU.
Unlike the other three structures, the <tt>rcu_dynticks</tt>
structure is not replicated per RCU flavor.
<li> <tt>rcu_head</tt>:
This structure represents RCU callbacks, and is the
only structure allocated and managed by RCU users.
Expand All @@ -287,14 +244,14 @@ <h3><a name="Data-Structure Relationships">Data-Structure Relationships</a></h3>
<p>If all you wanted from this article was a general notion of how
RCU's data structures are related, you are done.
Otherwise, each of the following sections give more details on
the <tt>rcu_state</tt>, <tt>rcu_node</tt>, <tt>rcu_data</tt>,
and <tt>rcu_dynticks</tt> data structures.
the <tt>rcu_state</tt>, <tt>rcu_node</tt> and <tt>rcu_data</tt> data
structures.

<h3><a name="The rcu_state Structure">
The <tt>rcu_state</tt> Structure</a></h3>

<p>The <tt>rcu_state</tt> structure is the base structure that
represents a flavor of RCU.
represents the state of RCU in the system.
This structure forms the interconnection between the
<tt>rcu_node</tt> and <tt>rcu_data</tt> structures,
tracks grace periods, contains the lock used to
Expand Down Expand Up @@ -389,7 +346,7 @@ <h5>Grace-Period Tracking</h5>
The bottom two bits are the state of the current grace period,
which can be zero for not yet started or one for in progress.
In other words, if the bottom two bits of <tt>-&gt;gp_seq</tt> are
zero, the corresponding flavor of RCU is idle.
zero, then RCU is idle.
Any other value in the bottom two bits indicates that something is broken.
This field is protected by the root <tt>rcu_node</tt> structure's
<tt>-&gt;lock</tt> field.
Expand Down Expand Up @@ -419,10 +376,10 @@ <h5>Miscellaneous</h5>
grace period in jiffies.
It is protected by the root <tt>rcu_node</tt>'s <tt>-&gt;lock</tt>.

<p>The <tt>-&gt;name</tt> field points to the name of the RCU flavor
(for example, &ldquo;rcu_sched&rdquo;), and is constant.
The <tt>-&gt;abbr</tt> field contains a one-character abbreviation,
for example, &ldquo;s&rdquo; for RCU-sched.
<p>The <tt>-&gt;name</tt> and <tt>-&gt;abbr</tt> fields distinguish
between preemptible RCU (&ldquo;rcu_preempt&rdquo; and &ldquo;p&rdquo;)
and non-preemptible RCU (&ldquo;rcu_sched&rdquo; and &ldquo;s&rdquo;).
These fields are used for diagnostic and tracing purposes.

<h3><a name="The rcu_node Structure">
The <tt>rcu_node</tt> Structure</a></h3>
Expand Down Expand Up @@ -971,25 +928,31 @@ <h3><a name="The rcu_segcblist Structure">
pointer.
The reason for this is that all the ready-to-invoke callbacks
(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted
all at once at callback-invocation time.
all at once at callback-invocation time (<tt>rcu_do_batch</tt>), due
to which <tt>-&gt;head</tt> may be set to NULL if there are no not-done
callbacks remaining in the <tt>rcu_segcblist</tt>.
If callback invocation must be postponed, for example, because a
high-priority process just woke up on this CPU, then the remaining
callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment.
Either way, the <tt>-&gt;len</tt> and <tt>-&gt;len_lazy</tt> counts
are adjusted after the corresponding callbacks have been invoked, and so
again it is the <tt>-&gt;len</tt> count that accurately reflects whether
or not there are callbacks associated with this <tt>rcu_segcblist</tt>
structure.
callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment and
<tt>-&gt;head</tt> once again points to the start of the segment.
In short, the head field can briefly be <tt>NULL</tt> even though the
CPU has callbacks present the entire time.
Therefore, it is not appropriate to test the <tt>-&gt;head</tt> pointer
for <tt>NULL</tt>.

<p>In contrast, the <tt>-&gt;len</tt> and <tt>-&gt;len_lazy</tt> counts
are adjusted only after the corresponding callbacks have been invoked.
This means that the <tt>-&gt;len</tt> count is zero only if
the <tt>rcu_segcblist</tt> structure really is devoid of callbacks.
Of course, off-CPU sampling of the <tt>-&gt;len</tt> count requires
the use of appropriate synchronization, for example, memory barriers.
careful use of appropriate synchronization, for example, memory barriers.
This synchronization can be a bit subtle, particularly in the case
of <tt>rcu_barrier()</tt>.

<h3><a name="The rcu_data Structure">
The <tt>rcu_data</tt> Structure</a></h3>

<p>The <tt>rcu_data</tt> maintains the per-CPU state for the
corresponding flavor of RCU.
<p>The <tt>rcu_data</tt> maintains the per-CPU state for the RCU subsystem.
The fields in this structure may be accessed only from the corresponding
CPU (and from tracing) unless otherwise stated.
This structure is the
Expand All @@ -1015,30 +978,19 @@ <h5>Connection to Other Data Structures</h5>

<pre>
1 int cpu;
2 struct rcu_state *rsp;
3 struct rcu_node *mynode;
4 struct rcu_dynticks *dynticks;
5 unsigned long grpmask;
6 bool beenonline;
2 struct rcu_node *mynode;
3 unsigned long grpmask;
4 bool beenonline;
</pre>

<p>The <tt>-&gt;cpu</tt> field contains the number of the
corresponding CPU, the <tt>-&gt;rsp</tt> pointer references
the corresponding <tt>rcu_state</tt> structure (and is most frequently
used to locate the name of the corresponding flavor of RCU for tracing),
and the <tt>-&gt;mynode</tt> field references the corresponding
<tt>rcu_node</tt> structure.
corresponding CPU and the <tt>-&gt;mynode</tt> field references the
corresponding <tt>rcu_node</tt> structure.
The <tt>-&gt;mynode</tt> is used to propagate quiescent states
up the combining tree.
<p>The <tt>-&gt;dynticks</tt> pointer references the
<tt>rcu_dynticks</tt> structure corresponding to this
CPU.
Recall that a single per-CPU instance of the <tt>rcu_dynticks</tt>
structure is shared among all flavors of RCU.
These first four fields are constant and therefore require not
synchronization.
These two fields are constant and therefore do not require synchronization.

</p><p>The <tt>-&gt;grpmask</tt> field indicates the bit in
<p>The <tt>-&gt;grpmask</tt> field indicates the bit in
the <tt>-&gt;mynode-&gt;qsmask</tt> corresponding to this
<tt>rcu_data</tt> structure, and is also used when propagating
quiescent states.
Expand All @@ -1057,12 +1009,12 @@ <h5>Quiescent-State and Grace-Period Tracking</h5>
3 bool cpu_no_qs;
4 bool core_needs_qs;
5 bool gpwrap;
6 unsigned long rcu_qs_ctr_snap;
</pre>

<p>The <tt>-&gt;gp_seq</tt> and <tt>-&gt;gp_seq_needed</tt>
fields are the counterparts of the fields of the same name
in the <tt>rcu_state</tt> and <tt>rcu_node</tt> structures.
<p>The <tt>-&gt;gp_seq</tt> field is the counterpart of the field of the same
name in the <tt>rcu_state</tt> and <tt>rcu_node</tt> structures. The
<tt>-&gt;gp_seq_needed</tt> field is the counterpart of the field of the same
name in the rcu_node</tt> structure.
They may each lag up to one behind their <tt>rcu_node</tt>
counterparts, but in <tt>CONFIG_NO_HZ_IDLE</tt> and
<tt>CONFIG_NO_HZ_FULL</tt> kernels can lag
Expand Down Expand Up @@ -1103,10 +1055,6 @@ <h5>Quiescent-State and Grace-Period Tracking</h5>
<tt>gp_seq</tt> counter is in danger of overflow, which
will cause the CPU to disregard the values of its counters on
its next exit from idle.
Finally, the <tt>rcu_qs_ctr_snap</tt> field is used to detect
cases where a given operation has resulted in a quiescent state
for all flavors of RCU, for example, <tt>cond_resched()</tt>
when RCU has indicated a need for quiescent states.

<h5>RCU Callback Handling</h5>

Expand Down Expand Up @@ -1179,26 +1127,22 @@ <h5>Dyntick-Idle Handling</h5>
count the number of times this CPU is determined to be in
dyntick-idle state, and is used for tracing and debugging purposes.

<h3><a name="The rcu_dynticks Structure">
The <tt>rcu_dynticks</tt> Structure</a></h3>

<p>The <tt>rcu_dynticks</tt> maintains the per-CPU dyntick-idle state
for the corresponding CPU.
Unlike the other structures, <tt>rcu_dynticks</tt> is not
replicated over the different flavors of RCU.
The fields in this structure may be accessed only from the corresponding
CPU (and from tracing) unless otherwise stated.
Its fields are as follows:
<p>
This portion of the rcu_data structure is declared as follows:

<pre>
1 long dynticks_nesting;
2 long dynticks_nmi_nesting;
3 atomic_t dynticks;
4 bool rcu_need_heavy_qs;
5 unsigned long rcu_qs_ctr;
6 bool rcu_urgent_qs;
5 bool rcu_urgent_qs;
</pre>

<p>These fields in the rcu_data structure maintain the per-CPU dyntick-idle
state for the corresponding CPU.
The fields may be accessed only from the corresponding CPU (and from tracing)
unless otherwise stated.

<p>The <tt>-&gt;dynticks_nesting</tt> field counts the
nesting depth of process execution, so that in normal circumstances
this counter has value zero or one.
Expand Down Expand Up @@ -1240,19 +1184,12 @@ <h3><a name="The rcu_dynticks Structure">
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
code, which provide a momentary idle sojourn in response.

</p><p>The <tt>-&gt;rcu_qs_ctr</tt> field is used to record
quiescent states from <tt>cond_resched()</tt>.
Because <tt>cond_resched()</tt> can execute quite frequently, this
must be quite lightweight, as in a non-atomic increment of this
per-CPU field.

</p><p>Finally, the <tt>-&gt;rcu_urgent_qs</tt> field is used to record
the fact that the RCU core code would really like to see a quiescent
state from the corresponding CPU, with the various other fields indicating
just how badly RCU wants this quiescent state.
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
code, which, if nothing else, non-atomically increment <tt>-&gt;rcu_qs_ctr</tt>
in response.
the fact that the RCU core code would really like to see a quiescent state from
the corresponding CPU, with the various other fields indicating just how badly
RCU wants this quiescent state.
This flag is checked by RCU's context-switch path
(<tt>rcu_note_context_switch</tt>) and the cond_resched code.

<table>
<tr><th>&nbsp;</th></tr>
Expand Down Expand Up @@ -1425,11 +1362,11 @@ <h3><a name="Accessor Functions">
<h3><a name="Summary">
Summary</a></h3>

So each flavor of RCU is represented by an <tt>rcu_state</tt> structure,
So the state of RCU is represented by an <tt>rcu_state</tt> structure,
which contains a combining tree of <tt>rcu_node</tt> and
<tt>rcu_data</tt> structures.
Finally, in <tt>CONFIG_NO_HZ_IDLE</tt> kernels, each CPU's dyntick-idle
state is tracked by an <tt>rcu_dynticks</tt> structure.
state is tracked by dynticks-related fields in the <tt>rcu_data</tt> structure.

If you made it this far, you are well prepared to read the code
walkthroughs in the other articles in this series.
Expand Down
Loading

0 comments on commit 792bf4d

Please sign in to comment.