Skip to content

Commit

Permalink
---
Browse files Browse the repository at this point in the history
yaml
---
r: 212055
b: refs/heads/master
c: 8d8d2e9
h: refs/heads/master
i:
  212053: d8d00a8
  212051: 46b6e04
  212047: 06df4fa
v: v3
  • Loading branch information
Linus Torvalds committed Oct 21, 2010
1 parent ce9cadf commit 7322e17
Show file tree
Hide file tree
Showing 1,820 changed files with 36,357 additions and 22,332 deletions.
2 changes: 1 addition & 1 deletion [refs]
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
---
refs/heads/master: 3b4b682becdfa9f42321aa024d5cc84f71f06d8c
refs/heads/master: 8d8d2e9ccd331a1345c88b292ebee9d256fd8749
8 changes: 4 additions & 4 deletions trunk/CREDITS
Original file line number Diff line number Diff line change
Expand Up @@ -3554,12 +3554,12 @@ E: cvance@nai.com
D: portions of the Linux Security Module (LSM) framework and security modules

N: Petr Vandrovec
E: vandrove@vc.cvut.cz
E: petr@vandrovec.name
D: Small contributions to ncpfs
D: Matrox framebuffer driver
S: Chudenicka 8
S: 10200 Prague 10, Hostivar
S: Czech Republic
S: 21513 Conradia Ct
S: Cupertino, CA 95014
S: USA

N: Thibaut Varene
E: T-Bone@parisc-linux.org
Expand Down
1 change: 0 additions & 1 deletion trunk/Documentation/DocBook/device-drivers.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@

<sect1><title>Atomic and pointer manipulation</title>
!Iarch/x86/include/asm/atomic.h
!Iarch/x86/include/asm/unaligned.h
</sect1>

<sect1><title>Delaying, scheduling, and timer routines</title>
Expand Down
1 change: 0 additions & 1 deletion trunk/Documentation/DocBook/kernel-api.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@
</para>

<sect1><title>String Conversions</title>
!Ilib/vsprintf.c
!Elib/vsprintf.c
</sect1>
<sect1><title>String Manipulation</title>
Expand Down
20 changes: 10 additions & 10 deletions trunk/Documentation/DocBook/kernel-locking.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -1645,7 +1645,9 @@ the amount of locking which needs to be done.
all the readers who were traversing the list when we deleted the
element are finished. We use <function>call_rcu()</function> to
register a callback which will actually destroy the object once
the readers are finished.
all pre-existing readers are finished. Alternatively,
<function>synchronize_rcu()</function> may be used to block until
all pre-existing are finished.
</para>
<para>
But how does Read Copy Update know when the readers are
Expand Down Expand Up @@ -1714,7 +1716,7 @@ the amount of locking which needs to be done.
- object_put(obj);
+ list_del_rcu(&amp;obj-&gt;list);
cache_num--;
+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu, obj);
+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu);
}

/* Must be holding cache_lock */
Expand All @@ -1725,14 +1727,6 @@ the amount of locking which needs to be done.
if (++cache_num > MAX_CACHE_SIZE) {
struct object *i, *outcast = NULL;
list_for_each_entry(i, &amp;cache, list) {
@@ -85,6 +94,7 @@
obj-&gt;popularity = 0;
atomic_set(&amp;obj-&gt;refcnt, 1); /* The cache holds a reference */
spin_lock_init(&amp;obj-&gt;lock);
+ INIT_RCU_HEAD(&amp;obj-&gt;rcu);

spin_lock_irqsave(&amp;cache_lock, flags);
__cache_add(obj);
@@ -104,12 +114,11 @@
struct object *cache_find(int id)
{
Expand Down Expand Up @@ -1961,6 +1955,12 @@ machines due to caching.
</sect1>
</chapter>

<chapter id="apiref">
<title>Mutex API reference</title>
!Iinclude/linux/mutex.h
!Ekernel/mutex.c
</chapter>

<chapter id="references">
<title>Further reading</title>

Expand Down
5 changes: 5 additions & 0 deletions trunk/Documentation/DocBook/tracepoint.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -104,4 +104,9 @@
<title>Block IO</title>
!Iinclude/trace/events/block.h
</chapter>

<chapter id="workqueue">
<title>Workqueue</title>
!Iinclude/trace/events/workqueue.h
</chapter>
</book>
46 changes: 39 additions & 7 deletions trunk/Documentation/RCU/checklist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome!
include:

a. Keeping a count of the number of data-structure elements
used by the RCU-protected data structure, including those
waiting for a grace period to elapse. Enforce a limit
on this number, stalling updates as needed to allow
previously deferred frees to complete.

Alternatively, limit only the number awaiting deferred
free rather than the total number of elements.
used by the RCU-protected data structure, including
those waiting for a grace period to elapse. Enforce a
limit on this number, stalling updates as needed to allow
previously deferred frees to complete. Alternatively,
limit only the number awaiting deferred free rather than
the total number of elements.

One way to stall the updates is to acquire the update-side
mutex. (Don't try this with a spinlock -- other CPUs
spinning on the lock could prevent the grace period
from ever ending.) Another way to stall the updates
is for the updates to use a wrapper function around
the memory allocator, so that this wrapper function
simulates OOM when there is too much memory awaiting an
RCU grace period. There are of course many other
variations on this theme.

b. Limiting update rate. For example, if updates occur only
once per hour, then no explicit rate limiting is required,
Expand Down Expand Up @@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome!
and the compiler to freely reorder code into and out of RCU
read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this.

17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
the __rcu sparse checks to validate your RCU code. These
can help find problems as follows:

CONFIG_PROVE_RCU: check that accesses to RCU-protected data
structures are carried out under the proper RCU
read-side critical section, while holding the right
combination of locks, or whatever other conditions
are appropriate.

CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
same object to call_rcu() (or friends) before an RCU
grace period has elapsed since the last time that you
passed that same object to call_rcu() (or friends).

__rcu sparse checks: tag the pointer to the RCU-protected data
structure with __rcu, and sparse will warn you if you
access that pointer without the services of one of the
variants of rcu_dereference().

These debugging aids can help you find problems that are
otherwise extremely difficult to spot.
18 changes: 18 additions & 0 deletions trunk/Documentation/RCU/stallwarn.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,24 @@ o A CPU looping with bottom halves disabled. This condition can
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule().

o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU,
in which case the next RCU grace period can never complete, which
will eventually cause the system to run out of memory and hang.
While the system is in the process of running itself out of
memory, you might see stall-warning messages.

o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
is running at a higher priority than the RCU softirq threads.
This will prevent RCU callbacks from ever being invoked,
and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
RCU grace periods from ever completing. Either way, the
system will eventually run out of memory and hang. In the
CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
messages.

o A bug in the RCU implementation.

o A hardware failure. This is quite unlikely, but has occurred
Expand Down
13 changes: 12 additions & 1 deletion trunk/Documentation/RCU/trace.txt
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,17 @@ o "b" is the batch limit for this CPU. If more than this number
of RCU callbacks is ready to invoke, then the remainder will
be deferred.

o "ci" is the number of RCU callbacks that have been invoked for
this CPU. Note that ci+ql is the number of callbacks that have
been registered in absence of CPU-hotplug activity.

o "co" is the number of RCU callbacks that have been orphaned due to
this CPU going offline.

o "ca" is the number of RCU callbacks that have been adopted due to
other CPUs going offline. Note that ci+co-ca+ql is the number of
RCU callbacks registered on this CPU.

There is also an rcu/rcudata.csv file with the same information in
comma-separated-variable spreadsheet format.

Expand Down Expand Up @@ -180,7 +191,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s

o "jfq" is the number of jiffies remaining for this grace period
before force_quiescent_state() is invoked to help push things
along. Note that CPUs in dyntick-idle mode thoughout the grace
along. Note that CPUs in dyntick-idle mode throughout the grace
period will not report on their own, but rather must be check by
some other CPU via force_quiescent_state().

Expand Down
45 changes: 45 additions & 0 deletions trunk/Documentation/block/cfq-iosched.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
CFQ ioscheduler tunables
========================

slice_idle
----------
This specifies how long CFQ should idle for next request on certain cfq queues
(for sequential workloads) and service trees (for random workloads) before
queue is expired and CFQ selects next queue to dispatch from.

By default slice_idle is a non-zero value. That means by default we idle on
queues/service trees. This can be very helpful on highly seeky media like
single spindle SATA/SAS disks where we can cut down on overall number of
seeks and see improved throughput.

Setting slice_idle to 0 will remove all the idling on queues/service tree
level and one should see an overall improved throughput on faster storage
devices like multiple SATA/SAS disks in hardware RAID configuration. The down
side is that isolation provided from WRITES also goes down and notion of
IO priority becomes weaker.

So depending on storage and workload, it might be useful to set slice_idle=0.
In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
keeping slice_idle enabled should be useful. For any configurations where
there are multiple spindles behind single LUN (Host based hardware RAID
controller or for storage arrays), setting slice_idle=0 might end up in better
throughput and acceptable latencies.

CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
process gets bigger time slice and lower priority process gets smaller time
slice. Measuring time becomes harder if storage is fast and supports NCQ and
it would be better to dispatch multiple requests from multiple cfq queues in
request queue at a time. In such scenario, it is not possible to measure time
consumed by single queue accurately.

What is possible though is to measure number of requests dispatched from a
single queue and also allow dispatch from multiple cfq queue at the same time.
This effectively becomes the fairness in terms of IOPS (IO operations per
second).

If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
to IOPS mode and starts providing fairness in terms of number of requests
dispatched. Note that this mode switching takes effect only for group
scheduling. For non-cgroup users nothing should change.
28 changes: 28 additions & 0 deletions trunk/Documentation/cgroups/blkio-controller.txt
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ Details of cgroup files
CFQ sysfs tunable
=================
/sys/block/<disk>/queue/iosched/group_isolation
-----------------------------------------------

If group_isolation=1, it provides stronger isolation between groups at the
expense of throughput. By default group_isolation is 0. In general that
Expand All @@ -243,6 +244,33 @@ By default one should run with group_isolation=0. If that is not sufficient
and one wants stronger isolation between groups, then set group_isolation=1
but this will come at cost of reduced throughput.

/sys/block/<disk>/queue/iosched/slice_idle
------------------------------------------
On a faster hardware CFQ can be slow, especially with sequential workload.
This happens because CFQ idles on a single queue and single queue might not
drive deeper request queue depths to keep the storage busy. In such scenarios
one can try setting slice_idle=0 and that would switch CFQ to IOPS
(IO operations per second) mode on NCQ supporting hardware.

That means CFQ will not idle between cfq queues of a cfq group and hence be
able to driver higher queue depth and achieve better throughput. That also
means that cfq provides fairness among groups in terms of IOPS and not in
terms of disk time.

/sys/block/<disk>/queue/iosched/group_idle
------------------------------------------
If one disables idling on individual cfq queues and cfq service trees by
setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
on the group in an attempt to provide fairness among groups.

By default group_idle is same as slice_idle and does not do anything if
slice_idle is enabled.

One can experience an overall throughput drop if you have created multiple
groups and put applications in that group which are not driving enough
IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
on individual groups and throughput should improve.

What works
==========
- Currently only sync IO queues are support. All the buffered writes are
Expand Down
23 changes: 20 additions & 3 deletions trunk/Documentation/cputopology.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,25 +14,39 @@ to /proc/cpuinfo.
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.

3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
3) /sys/devices/system/cpu/cpuX/topology/book_id:

the book ID of cpuX. Typically it is the hardware platform's
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.

4) /sys/devices/system/cpu/cpuX/topology/thread_siblings:

internel kernel map of cpuX's hardware threads within the same
core as cpuX

4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
5) /sys/devices/system/cpu/cpuX/topology/core_siblings:

internal kernel map of cpuX's hardware threads within the same
physical_package_id.

6) /sys/devices/system/cpu/cpuX/topology/book_siblings:

internal kernel map of cpuX's hardware threads within the same
book_id.

To implement it in an architecture-neutral way, a new source file,
drivers/base/topology.c, is to export the 4 attributes.
drivers/base/topology.c, is to export the 4 or 6 attributes. The two book
related sysfs files will only be created if CONFIG_SCHED_BOOK is selected.

For an architecture to support this feature, it must define some of
these macros in include/asm-XXX/topology.h:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
#define topology_book_id(cpu)
#define topology_thread_cpumask(cpu)
#define topology_core_cpumask(cpu)
#define topology_book_cpumask(cpu)

The type of **_id is int.
The type of siblings is (const) struct cpumask *.
Expand All @@ -45,6 +59,9 @@ not defined by include/asm-XXX/topology.h:
3) thread_siblings: just the given CPU
4) core_siblings: just the given CPU

For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
default definitions for topology_book_id() and topology_book_cpumask().

Additionally, CPU topology information is provided under
/sys/devices/system/cpu and includes these files. The internal
source for the output is in brackets ("[]").
Expand Down
22 changes: 14 additions & 8 deletions trunk/Documentation/gpio.txt
Original file line number Diff line number Diff line change
Expand Up @@ -109,17 +109,19 @@ use numbers 2000-2063 to identify GPIOs in a bank of I2C GPIO expanders.

If you want to initialize a structure with an invalid GPIO number, use
some negative number (perhaps "-EINVAL"); that will never be valid. To
test if a number could reference a GPIO, you may use this predicate:
test if such number from such a structure could reference a GPIO, you
may use this predicate:

int gpio_is_valid(int number);

A number that's not valid will be rejected by calls which may request
or free GPIOs (see below). Other numbers may also be rejected; for
example, a number might be valid but unused on a given board.

Whether a platform supports multiple GPIO controllers is currently a
platform-specific implementation issue.
example, a number might be valid but temporarily unused on a given board.

Whether a platform supports multiple GPIO controllers is a platform-specific
implementation issue, as are whether that support can leave "holes" in the space
of GPIO numbers, and whether new controllers can be added at runtime. Such issues
can affect things including whether adjacent GPIO numbers are both valid.

Using GPIOs
-----------
Expand Down Expand Up @@ -480,12 +482,16 @@ To support this framework, a platform's Kconfig will "select" either
ARCH_REQUIRE_GPIOLIB or ARCH_WANT_OPTIONAL_GPIOLIB
and arrange that its <asm/gpio.h> includes <asm-generic/gpio.h> and defines
three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep().
They may also want to provide a custom value for ARCH_NR_GPIOS.

ARCH_REQUIRE_GPIOLIB means that the gpio-lib code will always get compiled
It may also provide a custom value for ARCH_NR_GPIOS, so that it better
reflects the number of GPIOs in actual use on that platform, without
wasting static table space. (It should count both built-in/SoC GPIOs and
also ones on GPIO expanders.

ARCH_REQUIRE_GPIOLIB means that the gpiolib code will always get compiled
into the kernel on that architecture.

ARCH_WANT_OPTIONAL_GPIOLIB means the gpio-lib code defaults to off and the user
ARCH_WANT_OPTIONAL_GPIOLIB means the gpiolib code defaults to off and the user
can enable it and build it into the kernel optionally.

If neither of these options are selected, the platform does not support
Expand Down
Loading

0 comments on commit 7322e17

Please sign in to comment.