-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Stanislav Fomichev says: ==================== This series implements two new per-cgroup hooks: getsockopt and setsockopt along with a new sockopt program type. The idea is pretty similar to recently introduced cgroup sysctl hooks, but implementation is simpler (no need to convert to/from strings). What this can be applied to: * move business logic of what tos/priority/etc can be set by containers (either pass or reject) * handle existing options (or introduce new ones) differently by propagating some information in cgroup/socket local storage Compared to a simple syscall/{g,s}etsockopt tracepoint, those hooks are context aware. Meaning, they can access underlying socket and use cgroup and socket local storage. v9: * allow overwriting setsocktop arguments (Alexei Starovoitov) (see individual changes for more changelog details) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Loading branch information
Showing
30 changed files
with
2,415 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,7 @@ Program types | |
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
prog_cgroup_sockopt | ||
prog_cgroup_sysctl | ||
prog_flow_dissector | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
.. SPDX-License-Identifier: GPL-2.0 | ||
============================ | ||
BPF_PROG_TYPE_CGROUP_SOCKOPT | ||
============================ | ||
|
||
``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two | ||
cgroup hooks: | ||
|
||
* ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt`` | ||
system call. | ||
* ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt`` | ||
system call. | ||
|
||
The context (``struct bpf_sockopt``) has associated socket (``sk``) and | ||
all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``. | ||
|
||
BPF_CGROUP_SETSOCKOPT | ||
===================== | ||
|
||
``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of | ||
sockopt and it has writable context: it can modify the supplied arguments | ||
before passing them down to the kernel. This hook has access to the cgroup | ||
and socket local storage. | ||
|
||
If BPF program sets ``optlen`` to -1, the control will be returned | ||
back to the userspace after all other BPF programs in the cgroup | ||
chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed). | ||
|
||
Note, that ``optlen`` can not be increased beyond the user-supplied | ||
value. It can only be decreased or set to -1. Any other value will | ||
trigger ``EFAULT``. | ||
|
||
Return Type | ||
----------- | ||
|
||
* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. | ||
* ``1`` - success, continue with next BPF program in the cgroup chain. | ||
|
||
BPF_CGROUP_GETSOCKOPT | ||
===================== | ||
|
||
``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of | ||
sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval`` | ||
if it's interested in whatever kernel has returned. BPF hook can override | ||
the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen`` | ||
has been increased above initial ``getsockopt`` value (i.e. userspace | ||
buffer is too small), ``EFAULT`` is returned. | ||
|
||
This hook has access to the cgroup and socket local storage. | ||
|
||
Note, that the only acceptable value to set to ``retval`` is 0 and the | ||
original value that the kernel returned. Any other value will trigger | ||
``EFAULT``. | ||
|
||
Return Type | ||
----------- | ||
|
||
* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. | ||
* ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return | ||
``retval`` from the syscall (note that this can be overwritten by | ||
the BPF program from the parent cgroup). | ||
|
||
Cgroup Inheritance | ||
================== | ||
|
||
Suppose, there is the following cgroup hierarchy where each cgroup | ||
has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with | ||
``BPF_F_ALLOW_MULTI`` flag:: | ||
|
||
A (root, parent) | ||
\ | ||
B (child) | ||
|
||
When the application calls ``getsockopt`` syscall from the cgroup B, | ||
the programs are executed from the bottom up: B, A. First program | ||
(B) sees the result of kernel's ``getsockopt``. It can optionally | ||
adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that | ||
control will be passed to the second (A) program which will see the | ||
same context as B including any potential modifications. | ||
|
||
Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to | ||
A and B, the trigger order is B, then A. If B does any changes | ||
to the input arguments (``level``, ``optname``, ``optval``, ``optlen``), | ||
then the next program in the chain (A) will see those changes, | ||
*not* the original input ``setsockopt`` arguments. The potentially | ||
modified values will be then passed down to the kernel. | ||
|
||
Example | ||
======= | ||
|
||
See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example | ||
of BPF program that handles socket options. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.