-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
yaml --- r: 94831 b: refs/heads/master c: c135b65 h: refs/heads/master i: 94829: 82d38fb 94827: 6ed749e 94823: 43333e7 94815: 105962c v: v3
- Loading branch information
Linus Torvalds
committed
Apr 29, 2008
1 parent
f532da6
commit ae6fac2
Showing
656 changed files
with
14,684 additions
and
8,297 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
--- | ||
refs/heads/master: 65c0d4e54ae4b81d8c8bb685169e48306656bb5c | ||
refs/heads/master: c135b6592bd63925397e60425e0301f33f06c7a6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
DMA attributes | ||
============== | ||
|
||
This document describes the semantics of the DMA attributes that are | ||
defined in linux/dma-attrs.h. | ||
|
||
DMA_ATTR_WRITE_BARRIER | ||
---------------------- | ||
|
||
DMA_ATTR_WRITE_BARRIER is a (write) barrier attribute for DMA. DMA | ||
to a memory region with the DMA_ATTR_WRITE_BARRIER attribute forces | ||
all pending DMA writes to complete, and thus provides a mechanism to | ||
strictly order DMA from a device across all intervening busses and | ||
bridges. This barrier is not specific to a particular type of | ||
interconnect, it applies to the system as a whole, and so its | ||
implementation must account for the idiosyncracies of the system all | ||
the way from the DMA device to memory. | ||
|
||
As an example of a situation where DMA_ATTR_WRITE_BARRIER would be | ||
useful, suppose that a device does a DMA write to indicate that data is | ||
ready and available in memory. The DMA of the "completion indication" | ||
could race with data DMA. Mapping the memory used for completion | ||
indications with DMA_ATTR_WRITE_BARRIER would prevent the race. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
Device Whitelist Controller | ||
|
||
1. Description: | ||
|
||
Implement a cgroup to track and enforce open and mknod restrictions | ||
on device files. A device cgroup associates a device access | ||
whitelist with each cgroup. A whitelist entry has 4 fields. | ||
'type' is a (all), c (char), or b (block). 'all' means it applies | ||
to all types and all major and minor numbers. Major and minor are | ||
either an integer or * for all. Access is a composition of r | ||
(read), w (write), and m (mknod). | ||
|
||
The root device cgroup starts with rwm to 'all'. A child device | ||
cgroup gets a copy of the parent. Administrators can then remove | ||
devices from the whitelist or add new entries. A child cgroup can | ||
never receive a device access which is denied its parent. However | ||
when a device access is removed from a parent it will not also be | ||
removed from the child(ren). | ||
|
||
2. User Interface | ||
|
||
An entry is added using devices.allow, and removed using | ||
devices.deny. For instance | ||
|
||
echo 'c 1:3 mr' > /cgroups/1/devices.allow | ||
|
||
allows cgroup 1 to read and mknod the device usually known as | ||
/dev/null. Doing | ||
|
||
echo a > /cgroups/1/devices.deny | ||
|
||
will remove the default 'a *:* mrw' entry. | ||
|
||
3. Security | ||
|
||
Any task can move itself between cgroups. This clearly won't | ||
suffice, but we can decide the best way to adequately restrict | ||
movement as people get some experience with this. We may just want | ||
to require CAP_SYS_ADMIN, which at least is a separate bit from | ||
CAP_MKNOD. We may want to just refuse moving to a cgroup which | ||
isn't a descendent of the current one. Or we may want to use | ||
CAP_MAC_ADMIN, since we really are trying to lock down root. | ||
|
||
CAP_SYS_ADMIN is needed to modify the whitelist or move another | ||
task to a new cgroup. (Again we'll probably want to change that). | ||
|
||
A cgroup may not be granted more permissions than the cgroup's | ||
parent has. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
|
||
The Resource Counter | ||
|
||
The resource counter, declared at include/linux/res_counter.h, | ||
is supposed to facilitate the resource management by controllers | ||
by providing common stuff for accounting. | ||
|
||
This "stuff" includes the res_counter structure and routines | ||
to work with it. | ||
|
||
|
||
|
||
1. Crucial parts of the res_counter structure | ||
|
||
a. unsigned long long usage | ||
|
||
The usage value shows the amount of a resource that is consumed | ||
by a group at a given time. The units of measurement should be | ||
determined by the controller that uses this counter. E.g. it can | ||
be bytes, items or any other unit the controller operates on. | ||
|
||
b. unsigned long long max_usage | ||
|
||
The maximal value of the usage over time. | ||
|
||
This value is useful when gathering statistical information about | ||
the particular group, as it shows the actual resource requirements | ||
for a particular group, not just some usage snapshot. | ||
|
||
c. unsigned long long limit | ||
|
||
The maximal allowed amount of resource to consume by the group. In | ||
case the group requests for more resources, so that the usage value | ||
would exceed the limit, the resource allocation is rejected (see | ||
the next section). | ||
|
||
d. unsigned long long failcnt | ||
|
||
The failcnt stands for "failures counter". This is the number of | ||
resource allocation attempts that failed. | ||
|
||
c. spinlock_t lock | ||
|
||
Protects changes of the above values. | ||
|
||
|
||
|
||
2. Basic accounting routines | ||
|
||
a. void res_counter_init(struct res_counter *rc) | ||
|
||
Initializes the resource counter. As usual, should be the first | ||
routine called for a new counter. | ||
|
||
b. int res_counter_charge[_locked] | ||
(struct res_counter *rc, unsigned long val) | ||
|
||
When a resource is about to be allocated it has to be accounted | ||
with the appropriate resource counter (controller should determine | ||
which one to use on its own). This operation is called "charging". | ||
|
||
This is not very important which operation - resource allocation | ||
or charging - is performed first, but | ||
* if the allocation is performed first, this may create a | ||
temporary resource over-usage by the time resource counter is | ||
charged; | ||
* if the charging is performed first, then it should be uncharged | ||
on error path (if the one is called). | ||
|
||
c. void res_counter_uncharge[_locked] | ||
(struct res_counter *rc, unsigned long val) | ||
|
||
When a resource is released (freed) it should be de-accounted | ||
from the resource counter it was accounted to. This is called | ||
"uncharging". | ||
|
||
The _locked routines imply that the res_counter->lock is taken. | ||
|
||
|
||
2.1 Other accounting routines | ||
|
||
There are more routines that may help you with common needs, like | ||
checking whether the limit is reached or resetting the max_usage | ||
value. They are all declared in include/linux/res_counter.h. | ||
|
||
|
||
|
||
3. Analyzing the resource counter registrations | ||
|
||
a. If the failcnt value constantly grows, this means that the counter's | ||
limit is too tight. Either the group is misbehaving and consumes too | ||
many resources, or the configuration is not suitable for the group | ||
and the limit should be increased. | ||
|
||
b. The max_usage value can be used to quickly tune the group. One may | ||
set the limits to maximal values and either load the container with | ||
a common pattern or leave one for a while. After this the max_usage | ||
value shows the amount of memory the container would require during | ||
its common activity. | ||
|
||
Setting the limit a bit above this value gives a pretty good | ||
configuration that works in most of the cases. | ||
|
||
c. If the max_usage is much less than the limit, but the failcnt value | ||
is growing, then the group tries to allocate a big chunk of resource | ||
at once. | ||
|
||
d. If the max_usage is much less than the limit, but the failcnt value | ||
is 0, then this group is given too high limit, that it does not | ||
require. It is better to lower the limit a bit leaving more resource | ||
for other groups. | ||
|
||
|
||
|
||
4. Communication with the control groups subsystem (cgroups) | ||
|
||
All the resource controllers that are using cgroups and resource counters | ||
should provide files (in the cgroup filesystem) to work with the resource | ||
counter fields. They are recommended to adhere to the following rules: | ||
|
||
a. File names | ||
|
||
Field name File name | ||
--------------------------------------------------- | ||
usage usage_in_<unit_of_measurement> | ||
max_usage max_usage_in_<unit_of_measurement> | ||
limit limit_in_<unit_of_measurement> | ||
failcnt failcnt | ||
lock no file :) | ||
|
||
b. Reading from file should show the corresponding field value in the | ||
appropriate format. | ||
|
||
c. Writing to file | ||
|
||
Field Expected behavior | ||
---------------------------------- | ||
usage prohibited | ||
max_usage reset to usage | ||
limit set the limit | ||
failcnt reset to zero | ||
|
||
|
||
|
||
5. Usage example | ||
|
||
a. Declare a task group (take a look at cgroups subsystem for this) and | ||
fold a res_counter into it | ||
|
||
struct my_group { | ||
struct res_counter res; | ||
|
||
<other fields> | ||
} | ||
|
||
b. Put hooks in resource allocation/release paths | ||
|
||
int alloc_something(...) | ||
{ | ||
if (res_counter_charge(res_counter_ptr, amount) < 0) | ||
return -ENOMEM; | ||
|
||
<allocate the resource and return to the caller> | ||
} | ||
|
||
void release_something(...) | ||
{ | ||
res_counter_uncharge(res_counter_ptr, amount); | ||
|
||
<release the resource> | ||
} | ||
|
||
In order to keep the usage value self-consistent, both the | ||
"res_counter_ptr" and the "amount" in release_something() should be | ||
the same as they were in the alloc_something() when the releasing | ||
resource was allocated. | ||
|
||
c. Provide the way to read res_counter values and set them (the cgroups | ||
still can help with it). | ||
|
||
c. Compile and run :) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.