Skip to content

Commit

Permalink
Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/tj/cgroup

Pull cgroup updates from Tejun Heo:
 "Several notable changes this cycle:

   - Thread mode was merged. This will be used for cgroup2 support for
     CPU and possibly other controllers. Unfortunately, CPU controller
     cgroup2 support didn't make this pull request but most contentions
     have been resolved and the support is likely to be merged before
     the next merge window.

   - cgroup.stat now shows the number of descendant cgroups.

   - cpuset now can enable the easier-to-configure v2 behavior on v1
     hierarchy"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cpuset: Allow v2 behavior in v1 cgroup
  cgroup: Add mount flag to enable cpuset to use v2 behavior in v1 cgroup
  cgroup: remove unneeded checks
  cgroup: misc changes
  cgroup: short-circuit cset_cgroup_from_root() on the default hierarchy
  cgroup: re-use the parent pointer in cgroup_destroy_locked()
  cgroup: add cgroup.stat interface with basic hierarchy stats
  cgroup: implement hierarchy limits
  cgroup: keep track of number of descent cgroups
  cgroup: add comment to cgroup_enable_threaded()
  cgroup: remove unnecessary empty check when enabling threaded mode
  cgroup: update debug controller to print out thread mode information
  cgroup: implement cgroup v2 thread support
  cgroup: implement CSS_TASK_ITER_THREADED
  cgroup: introduce cgroup->dom_cgrp and threaded css_set handling
  cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS
  cgroup: reorganize cgroup.procs / task write path
  cgroup: replace css_set walking populated test with testing cgrp->nr_populated_csets
  cgroup: distinguish local and children populated states
  cgroup: remove now unused list_head @pending in cgroup_apply_cftypes()
  ...
  • Loading branch information
Linus Torvalds committed Sep 7, 2017
2 parents 9954d48 + b8d1b8e commit 608c1d3
Show file tree
Hide file tree
Showing 13 changed files with 1,194 additions and 272 deletions.
221 changes: 203 additions & 18 deletions Documentation/cgroup-v2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ v1 is available under Documentation/cgroup-v1/.
1-2. What is cgroup?
2. Basic Operations
2-1. Mounting
2-2. Organizing Processes
2-2. Organizing Processes and Threads
2-2-1. Processes
2-2-2. Threads
2-3. [Un]populated Notification
2-4. Controlling Controllers
2-4-1. Enabling and Disabling
Expand Down Expand Up @@ -167,8 +169,11 @@ cgroup v2 currently supports the following mount options.
Delegation section for details.


Organizing Processes
--------------------
Organizing Processes and Threads
--------------------------------

Processes
~~~~~~~~~

Initially, only the root cgroup exists to which all processes belong.
A child cgroup can be created by creating a sub-directory::
Expand Down Expand Up @@ -219,6 +224,105 @@ is removed subsequently, " (deleted)" is appended to the path::
0::/test-cgroup/test-cgroup-nested (deleted)


Threads
~~~~~~~

cgroup v2 supports thread granularity for a subset of controllers to
support use cases requiring hierarchical resource distribution across
the threads of a group of processes. By default, all threads of a
process belong to the same cgroup, which also serves as the resource
domain to host resource consumptions which are not specific to a
process or thread. The thread mode allows threads to be spread across
a subtree while still maintaining the common resource domain for them.

Controllers which support thread mode are called threaded controllers.
The ones which don't are called domain controllers.

Marking a cgroup threaded makes it join the resource domain of its
parent as a threaded cgroup. The parent may be another threaded
cgroup whose resource domain is further up in the hierarchy. The root
of a threaded subtree, that is, the nearest ancestor which is not
threaded, is called threaded domain or thread root interchangeably and
serves as the resource domain for the entire subtree.

Inside a threaded subtree, threads of a process can be put in
different cgroups and are not subject to the no internal process
constraint - threaded controllers can be enabled on non-leaf cgroups
whether they have threads in them or not.

As the threaded domain cgroup hosts all the domain resource
consumptions of the subtree, it is considered to have internal
resource consumptions whether there are processes in it or not and
can't have populated child cgroups which aren't threaded. Because the
root cgroup is not subject to no internal process constraint, it can
serve both as a threaded domain and a parent to domain cgroups.

The current operation mode or type of the cgroup is shown in the
"cgroup.type" file which indicates whether the cgroup is a normal
domain, a domain which is serving as the domain of a threaded subtree,
or a threaded cgroup.

On creation, a cgroup is always a domain cgroup and can be made
threaded by writing "threaded" to the "cgroup.type" file. The
operation is single direction::

# echo threaded > cgroup.type

Once threaded, the cgroup can't be made a domain again. To enable the
thread mode, the following conditions must be met.

- As the cgroup will join the parent's resource domain. The parent
must either be a valid (threaded) domain or a threaded cgroup.

- When the parent is an unthreaded domain, it must not have any domain
controllers enabled or populated domain children. The root is
exempt from this requirement.

Topology-wise, a cgroup can be in an invalid state. Please consider
the following toplogy::

A (threaded domain) - B (threaded) - C (domain, just created)

C is created as a domain but isn't connected to a parent which can
host child domains. C can't be used until it is turned into a
threaded cgroup. "cgroup.type" file will report "domain (invalid)" in
these cases. Operations which fail due to invalid topology use
EOPNOTSUPP as the errno.

A domain cgroup is turned into a threaded domain when one of its child
cgroup becomes threaded or threaded controllers are enabled in the
"cgroup.subtree_control" file while there are processes in the cgroup.
A threaded domain reverts to a normal domain when the conditions
clear.

When read, "cgroup.threads" contains the list of the thread IDs of all
threads in the cgroup. Except that the operations are per-thread
instead of per-process, "cgroup.threads" has the same format and
behaves the same way as "cgroup.procs". While "cgroup.threads" can be
written to in any cgroup, as it can only move threads inside the same
threaded domain, its operations are confined inside each threaded
subtree.

The threaded domain cgroup serves as the resource domain for the whole
subtree, and, while the threads can be scattered across the subtree,
all the processes are considered to be in the threaded domain cgroup.
"cgroup.procs" in a threaded domain cgroup contains the PIDs of all
processes in the subtree and is not readable in the subtree proper.
However, "cgroup.procs" can be written to from anywhere in the subtree
to migrate all threads of the matching process to the cgroup.

Only threaded controllers can be enabled in a threaded subtree. When
a threaded controller is enabled inside a threaded subtree, it only
accounts for and controls resource consumptions associated with the
threads in the cgroup and its descendants. All consumptions which
aren't tied to a specific thread belong to the threaded domain cgroup.

Because a threaded subtree is exempt from no internal process
constraint, a threaded controller must be able to handle competition
between threads in a non-leaf cgroup and its child cgroups. Each
threaded controller defines how such competitions are handled.


[Un]populated Notification
--------------------------

Expand Down Expand Up @@ -302,15 +406,15 @@ disabled if one or more children have it enabled.
No Internal Process Constraint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Non-root cgroups can only distribute resources to their children when
they don't have any processes of their own. In other words, only
cgroups which don't contain any processes can have controllers enabled
in their "cgroup.subtree_control" files.
Non-root cgroups can distribute domain resources to their children
only when they don't have any processes of their own. In other words,
only domain cgroups which don't contain any processes can have domain
controllers enabled in their "cgroup.subtree_control" files.

This guarantees that, when a controller is looking at the part of the
hierarchy which has it enabled, processes are always only on the
leaves. This rules out situations where child cgroups compete against
internal processes of the parent.
This guarantees that, when a domain controller is looking at the part
of the hierarchy which has it enabled, processes are always only on
the leaves. This rules out situations where child cgroups compete
against internal processes of the parent.

The root cgroup is exempt from this restriction. Root contains
processes and anonymous resource consumption which can't be associated
Expand All @@ -334,10 +438,10 @@ Model of Delegation
~~~~~~~~~~~~~~~~~~~

A cgroup can be delegated in two ways. First, to a less privileged
user by granting write access of the directory and its "cgroup.procs"
and "cgroup.subtree_control" files to the user. Second, if the
"nsdelegate" mount option is set, automatically to a cgroup namespace
on namespace creation.
user by granting write access of the directory and its "cgroup.procs",
"cgroup.threads" and "cgroup.subtree_control" files to the user.
Second, if the "nsdelegate" mount option is set, automatically to a
cgroup namespace on namespace creation.

Because the resource control interface files in a given directory
control the distribution of the parent's resources, the delegatee
Expand Down Expand Up @@ -644,6 +748,29 @@ Core Interface Files

All cgroup core files are prefixed with "cgroup."

cgroup.type

A read-write single value file which exists on non-root
cgroups.

When read, it indicates the current type of the cgroup, which
can be one of the following values.

- "domain" : A normal valid domain cgroup.

- "domain threaded" : A threaded domain cgroup which is
serving as the root of a threaded subtree.

- "domain invalid" : A cgroup which is in an invalid state.
It can't be populated or have controllers enabled. It may
be allowed to become a threaded cgroup.

- "threaded" : A threaded cgroup which is a member of a
threaded subtree.

A cgroup can be turned into a threaded cgroup by writing
"threaded" to this file.

cgroup.procs
A read-write new-line separated values file which exists on
all cgroups.
Expand All @@ -658,9 +785,6 @@ All cgroup core files are prefixed with "cgroup."
the PID to the cgroup. The writer should match all of the
following conditions.

- Its euid is either root or must match either uid or suid of
the target process.

- It must have write access to the "cgroup.procs" file.

- It must have write access to the "cgroup.procs" file of the
Expand All @@ -669,6 +793,35 @@ All cgroup core files are prefixed with "cgroup."
When delegating a sub-hierarchy, write access to this file
should be granted along with the containing directory.

In a threaded cgroup, reading this file fails with EOPNOTSUPP
as all the processes belong to the thread root. Writing is
supported and moves every thread of the process to the cgroup.

cgroup.threads
A read-write new-line separated values file which exists on
all cgroups.

When read, it lists the TIDs of all threads which belong to
the cgroup one-per-line. The TIDs are not ordered and the
same TID may show up more than once if the thread got moved to
another cgroup and then back or the TID got recycled while
reading.

A TID can be written to migrate the thread associated with the
TID to the cgroup. The writer should match all of the
following conditions.

- It must have write access to the "cgroup.threads" file.

- The cgroup that the thread is currently in must be in the
same resource domain as the destination cgroup.

- It must have write access to the "cgroup.procs" file of the
common ancestor of the source and destination cgroups.

When delegating a sub-hierarchy, write access to this file
should be granted along with the containing directory.

cgroup.controllers
A read-only space separated values file which exists on all
cgroups.
Expand Down Expand Up @@ -701,6 +854,38 @@ All cgroup core files are prefixed with "cgroup."
1 if the cgroup or its descendants contains any live
processes; otherwise, 0.

cgroup.max.descendants
A read-write single value files. The default is "max".

Maximum allowed number of descent cgroups.
If the actual number of descendants is equal or larger,
an attempt to create a new cgroup in the hierarchy will fail.

cgroup.max.depth
A read-write single value files. The default is "max".

Maximum allowed descent depth below the current cgroup.
If the actual descent depth is equal or larger,
an attempt to create a new child cgroup will fail.

cgroup.stat
A read-only flat-keyed file with the following entries:

nr_descendants
Total number of visible descendant cgroups.

nr_dying_descendants
Total number of dying descendant cgroups. A cgroup becomes
dying after being deleted by a user. The cgroup will remain
in dying state for some time undefined time (which can depend
on system load) before being completely destroyed.

A process can't enter a dying cgroup under any circumstances,
a dying cgroup can't revive.

A dying cgroup can consume system resources not exceeding
limits, which were active at the moment of cgroup deletion.


Controllers
===========
Expand Down
68 changes: 64 additions & 4 deletions include/linux/cgroup-defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,11 @@ enum {
* aren't writeable from inside the namespace.
*/
CGRP_ROOT_NS_DELEGATE = (1 << 3),

/*
* Enable cpuset controller in v1 cgroup to use v2 behavior.
*/
CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
};

/* cftype->flags */
Expand Down Expand Up @@ -172,6 +177,14 @@ struct css_set {
/* reference count */
refcount_t refcount;

/*
* For a domain cgroup, the following points to self. If threaded,
* to the matching cset of the nearest domain ancestor. The
* dom_cset provides access to the domain cgroup and its csses to
* which domain level resource consumptions should be charged.
*/
struct css_set *dom_cset;

/* the default cgroup associated with this css_set */
struct cgroup *dfl_cgrp;

Expand Down Expand Up @@ -200,6 +213,10 @@ struct css_set {
*/
struct list_head e_cset_node[CGROUP_SUBSYS_COUNT];

/* all threaded csets whose ->dom_cset points to this cset */
struct list_head threaded_csets;
struct list_head threaded_csets_node;

/*
* List running through all cgroup groups in the same hash
* slot. Protected by css_set_lock
Expand Down Expand Up @@ -261,13 +278,35 @@ struct cgroup {
*/
int level;

/* Maximum allowed descent tree depth */
int max_depth;

/*
* Keep track of total numbers of visible and dying descent cgroups.
* Dying cgroups are cgroups which were deleted by a user,
* but are still existing because someone else is holding a reference.
* max_descendants is a maximum allowed number of descent cgroups.
*/
int nr_descendants;
int nr_dying_descendants;
int max_descendants;

/*
* Each non-empty css_set associated with this cgroup contributes
* one to populated_cnt. All children with non-zero popuplated_cnt
* of their own contribute one. The count is zero iff there's no
* task in this cgroup or its subtree.
* one to nr_populated_csets. The counter is zero iff this cgroup
* doesn't have any tasks.
*
* All children which have non-zero nr_populated_csets and/or
* nr_populated_children of their own contribute one to either
* nr_populated_domain_children or nr_populated_threaded_children
* depending on their type. Each counter is zero iff all cgroups
* of the type in the subtree proper don't have any tasks.
*/
int populated_cnt;
int nr_populated_csets;
int nr_populated_domain_children;
int nr_populated_threaded_children;

int nr_threaded_children; /* # of live threaded child cgroups */

struct kernfs_node *kn; /* cgroup kernfs entry */
struct cgroup_file procs_file; /* handle for "cgroup.procs" */
Expand Down Expand Up @@ -305,6 +344,15 @@ struct cgroup {
*/
struct list_head e_csets[CGROUP_SUBSYS_COUNT];

/*
* If !threaded, self. If threaded, it points to the nearest
* domain ancestor. Inside a threaded subtree, cgroups are exempt
* from process granularity and no-internal-task constraint.
* Domain level resource consumptions which aren't tied to a
* specific task are charged to the dom_cgrp.
*/
struct cgroup *dom_cgrp;

/*
* list of pidlists, up to two for each namespace (one for procs, one
* for tasks); created on demand.
Expand Down Expand Up @@ -491,6 +539,18 @@ struct cgroup_subsys {
*/
bool implicit_on_dfl:1;

/*
* If %true, the controller, supports threaded mode on the default
* hierarchy. In a threaded subtree, both process granularity and
* no-internal-process constraint are ignored and a threaded
* controllers should be able to handle that.
*
* Note that as an implicit controller is automatically enabled on
* all cgroups on the default hierarchy, it should also be
* threaded. implicit && !threaded is not supported.
*/
bool threaded:1;

/*
* If %false, this subsystem is properly hierarchical -
* configuration, resource accounting and restriction on a parent
Expand Down
Loading

0 comments on commit 608c1d3

Please sign in to comment.