---

yaml --- r: 91421 b: refs/heads/master c: f8303dd h: refs/heads/master i: 91419: 1ac3865 v: v3
git-mirror · Feb 26, 2008 · f850bed · f850bed
1 parent 4b7537f
commit f850bed
Show file tree

Hide file tree

Showing 1,005 changed files with 29,349 additions and 12,899 deletions.
diff --git a/[refs] b/[refs]
@@ -1,2 +1,2 @@
 ---
-refs/heads/master: 74b20dad1c4cc0fd13ceca62fbab808919e1a7ea
+refs/heads/master: f8303dd3db57bd7ab2062985ad7a9e898a8ac423
diff --git a/trunk/Documentation/00-INDEX b/trunk/Documentation/00-INDEX
@@ -109,6 +109,8 @@ cpu-hotplug.txt
 	- document describing CPU hotplug support in the Linux kernel.
 cpu-load.txt
 	- document describing how CPU load statistics are collected.
+cpuidle/
+	- info on CPU_IDLE, CPU idle state management subsystem.
 cpusets.txt
 	- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
 cputopology.txt

diff --git a/trunk/Documentation/atomic_ops.txt b/trunk/Documentation/atomic_ops.txt
@@ -186,7 +186,8 @@ If the atomic value v is not equal to u, this function adds a to v, and
 returns non zero. If v is equal to u then it returns zero. This is done as
 an atomic operation.
 
-atomic_add_unless requires explicit memory barriers around the operation.
+atomic_add_unless requires explicit memory barriers around the operation
+unless it fails (returns 0).
 
 atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
 

diff --git a/trunk/Documentation/cgroups.txt b/trunk/Documentation/cgroups.txt
@@ -28,7 +28,7 @@ CONTENTS:
 4. Questions
 
 1. Control Groups
-==========
+=================
 
 1.1 What are cgroups ?
 ----------------------
@@ -143,10 +143,10 @@ proliferation of such cgroups.
 
 Also lets say that the administrator would like to give enhanced network
 access temporarily to a student's browser (since it is night and the user
-wants to do online gaming :)  OR give one of the students simulation
+wants to do online gaming :))  OR give one of the students simulation
 apps enhanced CPU power,
 
-With ability to write pids directly to resource classes, its just a
+With ability to write pids directly to resource classes, it's just a
 matter of :
 
        # echo pid > /mnt/network/<new_class>/tasks
@@ -227,10 +227,13 @@ Each cgroup is represented by a directory in the cgroup file system
 containing the following files describing that cgroup:
 
  - tasks: list of tasks (by pid) attached to that cgroup
- - notify_on_release flag: run /sbin/cgroup_release_agent on exit?
+ - releasable flag: cgroup currently removeable?
+ - notify_on_release flag: run the release agent on exit?
+ - release_agent: the path to use for release notifications (this file
+   exists in the top cgroup only)
 
 Other subsystems such as cpusets may add additional files in each
-cgroup dir
+cgroup dir.
 
 New cgroups are created using the mkdir system call or shell
 command.  The properties of a cgroup, such as its flags, are
@@ -257,7 +260,7 @@ performance.
 To allow access from a cgroup to the css_sets (and hence tasks)
 that comprise it, a set of cg_cgroup_link objects form a lattice;
 each cg_cgroup_link is linked into a list of cg_cgroup_links for
-a single cgroup on its cont_link_list field, and a list of
+a single cgroup on its cgrp_link_list field, and a list of
 cg_cgroup_links for a single css_set on its cg_link_list.
 
 Thus the set of tasks in a cgroup can be listed by iterating over
@@ -271,9 +274,6 @@ for cgroups, with a minimum of additional kernel code.
 1.4 What does notify_on_release do ?
 ------------------------------------
 
-*** notify_on_release is disabled in the current patch set. It will be
-*** reactivated in a future patch in a less-intrusive manner
-
 If the notify_on_release flag is enabled (1) in a cgroup, then
 whenever the last task in the cgroup leaves (exits or attaches to
 some other cgroup) and the last child cgroup of that cgroup
@@ -360,8 +360,8 @@ Now you want to do something with this cgroup.
 
 In this directory you can find several files:
 # ls
-notify_on_release release_agent tasks
-(plus whatever files are added by the attached subsystems)
+notify_on_release releasable tasks
+(plus whatever files added by the attached subsystems)
 
 Now attach your shell to this cgroup:
 # /bin/echo $$ > tasks
@@ -404,19 +404,13 @@ with a subsystem id which will be assigned by the cgroup system.
 Other fields in the cgroup_subsys object include:
 
 - subsys_id: a unique array index for the subsystem, indicating which
-  entry in cgroup->subsys[] this subsystem should be
-  managing. Initialized by cgroup_register_subsys(); prior to this
-  it should be initialized to -1
+  entry in cgroup->subsys[] this subsystem should be managing.
 
-- hierarchy: an index indicating which hierarchy, if any, this
-  subsystem is currently attached to. If this is -1, then the
-  subsystem is not attached to any hierarchy, and all tasks should be
-  considered to be members of the subsystem's top_cgroup. It should
-  be initialized to -1.
+- name: should be initialized to a unique subsystem name. Should be
+  no longer than MAX_CGROUP_TYPE_NAMELEN.
 
-- name: should be initialized to a unique subsystem name prior to
-  calling cgroup_register_subsystem. Should be no longer than
-  MAX_CGROUP_TYPE_NAMELEN
+- early_init: indicate if the subsystem needs early initialization
+  at system boot.
 
 Each cgroup object created by the system has an array of pointers,
 indexed by subsystem id; this pointer is entirely managed by the
@@ -434,8 +428,6 @@ situation.
 See kernel/cgroup.c for more details.
 
 Subsystems can take/release the cgroup_mutex via the functions
-cgroup_lock()/cgroup_unlock(), and can
-take/release the callback_mutex via the functions
 cgroup_lock()/cgroup_unlock().
 
 Accessing a task's cgroup pointer may be done in the following ways:
@@ -444,7 +436,7 @@ Accessing a task's cgroup pointer may be done in the following ways:
 - inside an rcu_read_lock() section via rcu_dereference()
 
 3.3 Subsystem API
---------------------------
+-----------------
 
 Each subsystem should:
 
@@ -455,7 +447,8 @@ Each subsystem may export the following methods. The only mandatory
 methods are create/destroy. Any others that are null are presumed to
 be successful no-ops.
 
-struct cgroup_subsys_state *create(struct cgroup *cont)
+struct cgroup_subsys_state *create(struct cgroup_subsys *ss,
+				   struct cgroup *cgrp)
 (cgroup_mutex held by caller)
 
 Called to create a subsystem state object for a cgroup. The
@@ -470,7 +463,7 @@ identified by the passed cgroup object having a NULL parent (since
 it's the root of the hierarchy) and may be an appropriate place for
 initialization code.
 
-void destroy(struct cgroup *cont)
+void destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
 (cgroup_mutex held by caller)
 
 The cgroup system is about to destroy the passed cgroup; the subsystem
@@ -481,7 +474,14 @@ cgroup->parent is still valid. (Note - can also be called for a
 newly-created cgroup if an error occurs after this subsystem's
 create() method has been called for the new cgroup).
 
-int can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
+void pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
+(cgroup_mutex held by caller)
+
+Called before checking the reference count on each subsystem. This may
+be useful for subsystems which have some extra references even if
+there are not tasks in the cgroup.
+
+int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 	       struct task_struct *task)
 (cgroup_mutex held by caller)
 
@@ -492,8 +492,8 @@ unspecified task can be moved into the cgroup. Note that this isn't
 called on a fork. If this method returns 0 (success) then this should
 remain valid while the caller holds cgroup_mutex.
 
-void attach(struct cgroup_subsys *ss, struct cgroup *cont,
-	    struct cgroup *old_cont, struct task_struct *task)
+void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+	    struct cgroup *old_cgrp, struct task_struct *task)
 
 Called after the task has been attached to the cgroup, to allow any
 post-attachment activity that requires memory allocations or blocking.
@@ -505,9 +505,9 @@ registration for all existing tasks.
 
 void exit(struct cgroup_subsys *ss, struct task_struct *task)
 
-Called during task exit
+Called during task exit.
 
-int populate(struct cgroup_subsys *ss, struct cgroup *cont)
+int populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
 
 Called after creation of a cgroup to allow a subsystem to populate
 the cgroup directory with file entries.  The subsystem should make
@@ -516,7 +516,7 @@ include/linux/cgroup.h for details).  Note that although this
 method can return an error code, the error code is currently not
 always handled well.
 
-void post_clone(struct cgroup_subsys *ss, struct cgroup *cont)
+void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
 
 Called at the end of cgroup_clone() to do any paramater
 initialization which might be required before a task could attach.  For

diff --git a/trunk/Documentation/controllers/memory.txt b/trunk/Documentation/controllers/memory.txt
@@ -170,14 +170,14 @@ NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
 mega or gigabytes.
 
 # cat /cgroups/0/memory.limit_in_bytes
-4194304 Bytes
+4194304
 
 NOTE: The interface has now changed to display the usage in bytes
 instead of pages
 
 We can check the usage:
 # cat /cgroups/0/memory.usage_in_bytes
-1216512 Bytes
+1216512
 
 A successful write to this file does not guarantee a successful set of
 this limit to the value written into the file.  This can be due to a
@@ -187,7 +187,7 @@ this file after a write to guarantee the value committed by the kernel.
 
 # echo -n 1 > memory.limit_in_bytes
 # cat memory.limit_in_bytes
-4096 Bytes
+4096
 
 The memory.failcnt field gives the number of times that the cgroup limit was
 exceeded.
@@ -233,13 +233,6 @@ cgroup might have some charge associated with it, even though all
 tasks have migrated away from it. Such charges are automatically dropped at
 rmdir() if there are no tasks.
 
-4.4 Choosing what to account  -- Page Cache (unmapped) vs RSS (mapped)?
-
-The type of memory accounted by the cgroup can be limited to just
-mapped pages by writing "1" to memory.control_type field
-
-echo -n 1 > memory.control_type
-
 5. TODO
 
 1. Add support for accounting huge pages (as a separate controller)
@@ -262,18 +255,19 @@ References
 3. Emelianov, Pavel. Resource controllers based on process cgroups
    http://lkml.org/lkml/2007/3/6/198
 4. Emelianov, Pavel. RSS controller based on process cgroups (v2)
-   http://lkml.org/lkml/2007/4/9/74
+   http://lkml.org/lkml/2007/4/9/78
 5. Emelianov, Pavel. RSS controller based on process cgroups (v3)
    http://lkml.org/lkml/2007/5/30/244
 6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
 7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
    subsystem (v3), http://lwn.net/Articles/235534/
-8. Singh, Balbir. RSS controller V2 test results (lmbench),
+8. Singh, Balbir. RSS controller v2 test results (lmbench),
    http://lkml.org/lkml/2007/5/17/232
-9. Singh, Balbir. RSS controller V2 AIM9 results
+9. Singh, Balbir. RSS controller v2 AIM9 results
    http://lkml.org/lkml/2007/5/18/1
-10. Singh, Balbir. Memory controller v6 results,
+10. Singh, Balbir. Memory controller v6 test results,
     http://lkml.org/lkml/2007/8/19/36
-11. Singh, Balbir. Memory controller v6, http://lkml.org/lkml/2007/8/17/69
+11. Singh, Balbir. Memory controller introduction (v6),
+    http://lkml.org/lkml/2007/8/17/69
 12. Corbet, Jonathan, Controlling memory use in cgroups,
     http://lwn.net/Articles/243795/
diff --git a/trunk/Documentation/cpuidle/core.txt b/trunk/Documentation/cpuidle/core.txt
@@ -0,0 +1,23 @@
+
+		Supporting multiple CPU idle levels in kernel
+
+				cpuidle
+
+General Information:
+
+Various CPUs today support multiple idle levels that are differentiated
+by varying exit latencies and power consumption during idle.
+cpuidle is a generic in-kernel infrastructure that separates
+idle policy (governor) from idle mechanism (driver) and provides a
+standardized infrastructure to support independent development of
+governors and drivers.
+
+cpuidle resides under drivers/cpuidle.
+
+Boot options:
+"cpuidle_sysfs_switch"
+enables current_governor interface in /sys/devices/system/cpu/cpuidle/,
+which can be used to switch governors at run time. This boot option
+is meant for developer testing only. In normal usage, kernel picks the
+best governor based on governor ratings.
+SEE ALSO: sysfs.txt in this directory.
diff --git a/trunk/Documentation/cpuidle/driver.txt b/trunk/Documentation/cpuidle/driver.txt
@@ -0,0 +1,31 @@
+
+
+		Supporting multiple CPU idle levels in kernel
+
+				cpuidle drivers
+
+
+
+
+cpuidle driver hooks into the cpuidle infrastructure and handles the
+architecture/platform dependent part of CPU idle states. Driver
+provides the platform idle state detection capability and also
+has mechanisms in place to support actual entry-exit into CPU idle states.
+
+cpuidle driver initializes the cpuidle_device structure for each CPU device
+and registers with cpuidle using cpuidle_register_device.
+
+It can also support the dynamic changes (like battery <-> AC), by using
+cpuidle_pause_and_lock, cpuidle_disable_device and cpuidle_enable_device,
+cpuidle_resume_and_unlock.
+
+Interfaces:
+extern int cpuidle_register_driver(struct cpuidle_driver *drv);
+extern void cpuidle_unregister_driver(struct cpuidle_driver *drv);
+extern int cpuidle_register_device(struct cpuidle_device *dev);
+extern void cpuidle_unregister_device(struct cpuidle_device *dev);
+
+extern void cpuidle_pause_and_lock(void);
+extern void cpuidle_resume_and_unlock(void);
+extern int cpuidle_enable_device(struct cpuidle_device *dev);
+extern void cpuidle_disable_device(struct cpuidle_device *dev);
diff --git a/trunk/Documentation/cpuidle/governor.txt b/trunk/Documentation/cpuidle/governor.txt
@@ -0,0 +1,29 @@
+
+
+
+		Supporting multiple CPU idle levels in kernel
+
+				cpuidle governors
+
+
+
+
+cpuidle governor is policy routine that decides what idle state to enter at
+any given time. cpuidle core uses different callbacks to the governor.
+
+* enable() to enable governor for a particular device
+* disable() to disable governor for a particular device
+* select() to select an idle state to enter
+* reflect() called after returning from the idle state, which can be used
+  by the governor for some record keeping.
+
+More than one governor can be registered at the same time and
+users can switch between drivers using /sysfs interface (when enabled).
+More than one governor part is supported for developers to easily experiment
+with different governors. By default, most optimal governor based on your
+kernel configuration and platform will be selected by cpuidle.
+
+Interfaces:
+extern int cpuidle_register_governor(struct cpuidle_governor *gov);
+extern void cpuidle_unregister_governor(struct cpuidle_governor *gov);
+struct cpuidle_governor