Skip to content

Commit

Permalink
---
Browse files Browse the repository at this point in the history
yaml
---
r: 91421
b: refs/heads/master
c: f8303dd
h: refs/heads/master
i:
  91419: 1ac3865
v: v3
  • Loading branch information
Paul Mackerras committed Feb 26, 2008
1 parent 4b7537f commit f850bed
Show file tree
Hide file tree
Showing 1,005 changed files with 29,349 additions and 12,899 deletions.
2 changes: 1 addition & 1 deletion [refs]
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
---
refs/heads/master: 74b20dad1c4cc0fd13ceca62fbab808919e1a7ea
refs/heads/master: f8303dd3db57bd7ab2062985ad7a9e898a8ac423
2 changes: 2 additions & 0 deletions trunk/Documentation/00-INDEX
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,8 @@ cpu-hotplug.txt
- document describing CPU hotplug support in the Linux kernel.
cpu-load.txt
- document describing how CPU load statistics are collected.
cpuidle/
- info on CPU_IDLE, CPU idle state management subsystem.
cpusets.txt
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
cputopology.txt
Expand Down
3 changes: 2 additions & 1 deletion trunk/Documentation/atomic_ops.txt
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,8 @@ If the atomic value v is not equal to u, this function adds a to v, and
returns non zero. If v is equal to u then it returns zero. This is done as
an atomic operation.

atomic_add_unless requires explicit memory barriers around the operation.
atomic_add_unless requires explicit memory barriers around the operation
unless it fails (returns 0).

atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)

Expand Down
66 changes: 33 additions & 33 deletions trunk/Documentation/cgroups.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ CONTENTS:
4. Questions

1. Control Groups
==========
=================

1.1 What are cgroups ?
----------------------
Expand Down Expand Up @@ -143,10 +143,10 @@ proliferation of such cgroups.

Also lets say that the administrator would like to give enhanced network
access temporarily to a student's browser (since it is night and the user
wants to do online gaming :) OR give one of the students simulation
wants to do online gaming :)) OR give one of the students simulation
apps enhanced CPU power,

With ability to write pids directly to resource classes, its just a
With ability to write pids directly to resource classes, it's just a
matter of :

# echo pid > /mnt/network/<new_class>/tasks
Expand Down Expand Up @@ -227,10 +227,13 @@ Each cgroup is represented by a directory in the cgroup file system
containing the following files describing that cgroup:

- tasks: list of tasks (by pid) attached to that cgroup
- notify_on_release flag: run /sbin/cgroup_release_agent on exit?
- releasable flag: cgroup currently removeable?
- notify_on_release flag: run the release agent on exit?
- release_agent: the path to use for release notifications (this file
exists in the top cgroup only)

Other subsystems such as cpusets may add additional files in each
cgroup dir
cgroup dir.

New cgroups are created using the mkdir system call or shell
command. The properties of a cgroup, such as its flags, are
Expand All @@ -257,7 +260,7 @@ performance.
To allow access from a cgroup to the css_sets (and hence tasks)
that comprise it, a set of cg_cgroup_link objects form a lattice;
each cg_cgroup_link is linked into a list of cg_cgroup_links for
a single cgroup on its cont_link_list field, and a list of
a single cgroup on its cgrp_link_list field, and a list of
cg_cgroup_links for a single css_set on its cg_link_list.

Thus the set of tasks in a cgroup can be listed by iterating over
Expand All @@ -271,9 +274,6 @@ for cgroups, with a minimum of additional kernel code.
1.4 What does notify_on_release do ?
------------------------------------

*** notify_on_release is disabled in the current patch set. It will be
*** reactivated in a future patch in a less-intrusive manner

If the notify_on_release flag is enabled (1) in a cgroup, then
whenever the last task in the cgroup leaves (exits or attaches to
some other cgroup) and the last child cgroup of that cgroup
Expand Down Expand Up @@ -360,8 +360,8 @@ Now you want to do something with this cgroup.

In this directory you can find several files:
# ls
notify_on_release release_agent tasks
(plus whatever files are added by the attached subsystems)
notify_on_release releasable tasks
(plus whatever files added by the attached subsystems)

Now attach your shell to this cgroup:
# /bin/echo $$ > tasks
Expand Down Expand Up @@ -404,19 +404,13 @@ with a subsystem id which will be assigned by the cgroup system.
Other fields in the cgroup_subsys object include:

- subsys_id: a unique array index for the subsystem, indicating which
entry in cgroup->subsys[] this subsystem should be
managing. Initialized by cgroup_register_subsys(); prior to this
it should be initialized to -1
entry in cgroup->subsys[] this subsystem should be managing.

- hierarchy: an index indicating which hierarchy, if any, this
subsystem is currently attached to. If this is -1, then the
subsystem is not attached to any hierarchy, and all tasks should be
considered to be members of the subsystem's top_cgroup. It should
be initialized to -1.
- name: should be initialized to a unique subsystem name. Should be
no longer than MAX_CGROUP_TYPE_NAMELEN.

- name: should be initialized to a unique subsystem name prior to
calling cgroup_register_subsystem. Should be no longer than
MAX_CGROUP_TYPE_NAMELEN
- early_init: indicate if the subsystem needs early initialization
at system boot.

Each cgroup object created by the system has an array of pointers,
indexed by subsystem id; this pointer is entirely managed by the
Expand All @@ -434,8 +428,6 @@ situation.
See kernel/cgroup.c for more details.

Subsystems can take/release the cgroup_mutex via the functions
cgroup_lock()/cgroup_unlock(), and can
take/release the callback_mutex via the functions
cgroup_lock()/cgroup_unlock().

Accessing a task's cgroup pointer may be done in the following ways:
Expand All @@ -444,7 +436,7 @@ Accessing a task's cgroup pointer may be done in the following ways:
- inside an rcu_read_lock() section via rcu_dereference()

3.3 Subsystem API
--------------------------
-----------------

Each subsystem should:

Expand All @@ -455,7 +447,8 @@ Each subsystem may export the following methods. The only mandatory
methods are create/destroy. Any others that are null are presumed to
be successful no-ops.

struct cgroup_subsys_state *create(struct cgroup *cont)
struct cgroup_subsys_state *create(struct cgroup_subsys *ss,
struct cgroup *cgrp)
(cgroup_mutex held by caller)

Called to create a subsystem state object for a cgroup. The
Expand All @@ -470,7 +463,7 @@ identified by the passed cgroup object having a NULL parent (since
it's the root of the hierarchy) and may be an appropriate place for
initialization code.

void destroy(struct cgroup *cont)
void destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
(cgroup_mutex held by caller)

The cgroup system is about to destroy the passed cgroup; the subsystem
Expand All @@ -481,7 +474,14 @@ cgroup->parent is still valid. (Note - can also be called for a
newly-created cgroup if an error occurs after this subsystem's
create() method has been called for the new cgroup).

int can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
void pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
(cgroup_mutex held by caller)

Called before checking the reference count on each subsystem. This may
be useful for subsystems which have some extra references even if
there are not tasks in the cgroup.

int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct task_struct *task)
(cgroup_mutex held by caller)

Expand All @@ -492,8 +492,8 @@ unspecified task can be moved into the cgroup. Note that this isn't
called on a fork. If this method returns 0 (success) then this should
remain valid while the caller holds cgroup_mutex.

void attach(struct cgroup_subsys *ss, struct cgroup *cont,
struct cgroup *old_cont, struct task_struct *task)
void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup *old_cgrp, struct task_struct *task)

Called after the task has been attached to the cgroup, to allow any
post-attachment activity that requires memory allocations or blocking.
Expand All @@ -505,9 +505,9 @@ registration for all existing tasks.

void exit(struct cgroup_subsys *ss, struct task_struct *task)

Called during task exit
Called during task exit.

int populate(struct cgroup_subsys *ss, struct cgroup *cont)
int populate(struct cgroup_subsys *ss, struct cgroup *cgrp)

Called after creation of a cgroup to allow a subsystem to populate
the cgroup directory with file entries. The subsystem should make
Expand All @@ -516,7 +516,7 @@ include/linux/cgroup.h for details). Note that although this
method can return an error code, the error code is currently not
always handled well.

void post_clone(struct cgroup_subsys *ss, struct cgroup *cont)
void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)

Called at the end of cgroup_clone() to do any paramater
initialization which might be required before a task could attach. For
Expand Down
24 changes: 9 additions & 15 deletions trunk/Documentation/controllers/memory.txt
Original file line number Diff line number Diff line change
Expand Up @@ -170,14 +170,14 @@ NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
mega or gigabytes.

# cat /cgroups/0/memory.limit_in_bytes
4194304 Bytes
4194304

NOTE: The interface has now changed to display the usage in bytes
instead of pages

We can check the usage:
# cat /cgroups/0/memory.usage_in_bytes
1216512 Bytes
1216512

A successful write to this file does not guarantee a successful set of
this limit to the value written into the file. This can be due to a
Expand All @@ -187,7 +187,7 @@ this file after a write to guarantee the value committed by the kernel.

# echo -n 1 > memory.limit_in_bytes
# cat memory.limit_in_bytes
4096 Bytes
4096

The memory.failcnt field gives the number of times that the cgroup limit was
exceeded.
Expand Down Expand Up @@ -233,13 +233,6 @@ cgroup might have some charge associated with it, even though all
tasks have migrated away from it. Such charges are automatically dropped at
rmdir() if there are no tasks.

4.4 Choosing what to account -- Page Cache (unmapped) vs RSS (mapped)?

The type of memory accounted by the cgroup can be limited to just
mapped pages by writing "1" to memory.control_type field

echo -n 1 > memory.control_type

5. TODO

1. Add support for accounting huge pages (as a separate controller)
Expand All @@ -262,18 +255,19 @@ References
3. Emelianov, Pavel. Resource controllers based on process cgroups
http://lkml.org/lkml/2007/3/6/198
4. Emelianov, Pavel. RSS controller based on process cgroups (v2)
http://lkml.org/lkml/2007/4/9/74
http://lkml.org/lkml/2007/4/9/78
5. Emelianov, Pavel. RSS controller based on process cgroups (v3)
http://lkml.org/lkml/2007/5/30/244
6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
subsystem (v3), http://lwn.net/Articles/235534/
8. Singh, Balbir. RSS controller V2 test results (lmbench),
8. Singh, Balbir. RSS controller v2 test results (lmbench),
http://lkml.org/lkml/2007/5/17/232
9. Singh, Balbir. RSS controller V2 AIM9 results
9. Singh, Balbir. RSS controller v2 AIM9 results
http://lkml.org/lkml/2007/5/18/1
10. Singh, Balbir. Memory controller v6 results,
10. Singh, Balbir. Memory controller v6 test results,
http://lkml.org/lkml/2007/8/19/36
11. Singh, Balbir. Memory controller v6, http://lkml.org/lkml/2007/8/17/69
11. Singh, Balbir. Memory controller introduction (v6),
http://lkml.org/lkml/2007/8/17/69
12. Corbet, Jonathan, Controlling memory use in cgroups,
http://lwn.net/Articles/243795/
23 changes: 23 additions & 0 deletions trunk/Documentation/cpuidle/core.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

Supporting multiple CPU idle levels in kernel

cpuidle

General Information:

Various CPUs today support multiple idle levels that are differentiated
by varying exit latencies and power consumption during idle.
cpuidle is a generic in-kernel infrastructure that separates
idle policy (governor) from idle mechanism (driver) and provides a
standardized infrastructure to support independent development of
governors and drivers.

cpuidle resides under drivers/cpuidle.

Boot options:
"cpuidle_sysfs_switch"
enables current_governor interface in /sys/devices/system/cpu/cpuidle/,
which can be used to switch governors at run time. This boot option
is meant for developer testing only. In normal usage, kernel picks the
best governor based on governor ratings.
SEE ALSO: sysfs.txt in this directory.
31 changes: 31 additions & 0 deletions trunk/Documentation/cpuidle/driver.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@


Supporting multiple CPU idle levels in kernel

cpuidle drivers




cpuidle driver hooks into the cpuidle infrastructure and handles the
architecture/platform dependent part of CPU idle states. Driver
provides the platform idle state detection capability and also
has mechanisms in place to support actual entry-exit into CPU idle states.

cpuidle driver initializes the cpuidle_device structure for each CPU device
and registers with cpuidle using cpuidle_register_device.

It can also support the dynamic changes (like battery <-> AC), by using
cpuidle_pause_and_lock, cpuidle_disable_device and cpuidle_enable_device,
cpuidle_resume_and_unlock.

Interfaces:
extern int cpuidle_register_driver(struct cpuidle_driver *drv);
extern void cpuidle_unregister_driver(struct cpuidle_driver *drv);
extern int cpuidle_register_device(struct cpuidle_device *dev);
extern void cpuidle_unregister_device(struct cpuidle_device *dev);

extern void cpuidle_pause_and_lock(void);
extern void cpuidle_resume_and_unlock(void);
extern int cpuidle_enable_device(struct cpuidle_device *dev);
extern void cpuidle_disable_device(struct cpuidle_device *dev);
29 changes: 29 additions & 0 deletions trunk/Documentation/cpuidle/governor.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@



Supporting multiple CPU idle levels in kernel

cpuidle governors




cpuidle governor is policy routine that decides what idle state to enter at
any given time. cpuidle core uses different callbacks to the governor.

* enable() to enable governor for a particular device
* disable() to disable governor for a particular device
* select() to select an idle state to enter
* reflect() called after returning from the idle state, which can be used
by the governor for some record keeping.

More than one governor can be registered at the same time and
users can switch between drivers using /sysfs interface (when enabled).
More than one governor part is supported for developers to easily experiment
with different governors. By default, most optimal governor based on your
kernel configuration and platform will be selected by cpuidle.

Interfaces:
extern int cpuidle_register_governor(struct cpuidle_governor *gov);
extern void cpuidle_unregister_governor(struct cpuidle_governor *gov);
struct cpuidle_governor
Loading

0 comments on commit f850bed

Please sign in to comment.