-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-03-21 v2 We've added 137 non-merge commits during the last 17 day(s) which contain a total of 143 files changed, 7123 insertions(+), 1092 deletions(-). The main changes are: 1) Custom SEC() handling in libbpf, from Andrii. 2) subskeleton support, from Delyan. 3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao. 4) Fix net.core.bpf_jit_harden race, from Hou. 5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub. 6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami. The arch specific bits will come later. 7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri. 8) Enable non-atomic allocations in local storage, from Joanne. 9) Various var_off ptr_to_btf_id fixed, from Kumar. 10) bpf_ima_file_hash helper, from Roberto. 11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits) selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" bpftool: Fix a bug in subskeleton code generation bpf: Fix bpf_prog_pack when PMU_SIZE is not defined bpf: Fix bpf_prog_pack for multi-node setup bpf: Fix warning for cast from restricted gfp_t in verifier bpf, arm: Fix various typos in comments libbpf: Close fd in bpf_object__reuse_map bpftool: Fix print error when show bpf map bpf: Fix kprobe_multi return probe backtrace Revert "bpf: Add support to inline bpf_get_func_ip helper on x86" bpf: Simplify check in btf_parse_hdr() selftests/bpf/test_lirc_mode2.sh: Exit with proper code bpf: Check for NULL return from bpf_get_btf_vmlinux selftests/bpf: Test skipping stacktrace bpf: Adjust BPF stack helper functions to accommodate skip > 0 bpf: Select proper size for bpf_prog_pack ... ==================== Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Loading branch information
Showing
143 changed files
with
7,123 additions
and
1,092 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
.. SPDX-License-Identifier: GPL-2.0 | ||
=================================== | ||
Running BPF programs from userspace | ||
=================================== | ||
|
||
This document describes the ``BPF_PROG_RUN`` facility for running BPF programs | ||
from userspace. | ||
|
||
.. contents:: | ||
:local: | ||
:depth: 2 | ||
|
||
|
||
Overview | ||
-------- | ||
|
||
The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to | ||
execute a BPF program in the kernel and return the results to userspace. This | ||
can be used to unit test BPF programs against user-supplied context objects, and | ||
as way to explicitly execute programs in the kernel for their side effects. The | ||
command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue | ||
to be defined in the UAPI header, aliased to the same value. | ||
|
||
The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the | ||
following types: | ||
|
||
- ``BPF_PROG_TYPE_SOCKET_FILTER`` | ||
- ``BPF_PROG_TYPE_SCHED_CLS`` | ||
- ``BPF_PROG_TYPE_SCHED_ACT`` | ||
- ``BPF_PROG_TYPE_XDP`` | ||
- ``BPF_PROG_TYPE_SK_LOOKUP`` | ||
- ``BPF_PROG_TYPE_CGROUP_SKB`` | ||
- ``BPF_PROG_TYPE_LWT_IN`` | ||
- ``BPF_PROG_TYPE_LWT_OUT`` | ||
- ``BPF_PROG_TYPE_LWT_XMIT`` | ||
- ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` | ||
- ``BPF_PROG_TYPE_FLOW_DISSECTOR`` | ||
- ``BPF_PROG_TYPE_STRUCT_OPS`` | ||
- ``BPF_PROG_TYPE_RAW_TRACEPOINT`` | ||
- ``BPF_PROG_TYPE_SYSCALL`` | ||
|
||
When using the ``BPF_PROG_RUN`` command, userspace supplies an input context | ||
object and (for program types operating on network packets) a buffer containing | ||
the packet data that the BPF program will operate on. The kernel will then | ||
execute the program and return the results to userspace. Note that programs will | ||
not have any side effects while being run in this mode; in particular, packets | ||
will not actually be redirected or dropped, the program return code will just be | ||
returned to userspace. A separate mode for live execution of XDP programs is | ||
provided, documented separately below. | ||
|
||
Running XDP programs in "live frame mode" | ||
----------------------------------------- | ||
|
||
The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, | ||
which can be used to execute XDP programs in a way where packets will actually | ||
be processed by the kernel after the execution of the XDP program as if they | ||
arrived on a physical interface. This mode is activated by setting the | ||
``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to | ||
``BPF_PROG_RUN``. | ||
|
||
The live packet mode is optimised for high performance execution of the supplied | ||
XDP program many times (suitable for, e.g., running as a traffic generator), | ||
which means the semantics are not quite as straight-forward as the regular test | ||
run mode. Specifically: | ||
|
||
- When executing an XDP program in live frame mode, the result of the execution | ||
will not be returned to userspace; instead, the kernel will perform the | ||
operation indicated by the program's return code (drop the packet, redirect | ||
it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes | ||
in the syscall parameters when running in this mode will be rejected. In | ||
addition, not all failures will be reported back to userspace directly; | ||
specifically, only fatal errors in setup or during execution (like memory | ||
allocation errors) will halt execution and return an error. If an error occurs | ||
in packet processing, like a failure to redirect to a given interface, | ||
execution will continue with the next repetition; these errors can be detected | ||
via the same trace points as for regular XDP programs. | ||
|
||
- Userspace can supply an ifindex as part of the context object, just like in | ||
the regular (non-live) mode. The XDP program will be executed as though the | ||
packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context | ||
object will point to that interface. Furthermore, if the XDP program returns | ||
``XDP_PASS``, the packet will be injected into the kernel networking stack as | ||
though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet | ||
will be transmitted *out* of that same interface. Do note, though, that | ||
because the program execution is not happening in driver context, an | ||
``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to | ||
that same interface (i.e., it will only work if the driver has support for the | ||
``ndo_xdp_xmit`` driver op). | ||
|
||
- When running the program with multiple repetitions, the execution will happen | ||
in batches. The batch size defaults to 64 packets (which is same as the | ||
maximum NAPI receive batch size), but can be specified by userspace through | ||
the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, | ||
the kernel executes the XDP program repeatedly, each invocation getting a | ||
separate copy of the packet data. For each repetition, if the program drops | ||
the packet, the data page is immediately recycled (see below). Otherwise, the | ||
packet is buffered until the end of the batch, at which point all packets | ||
buffered this way during the batch are transmitted at once. | ||
|
||
- When setting up the test run, the kernel will initialise a pool of memory | ||
pages of the same size as the batch size. Each memory page will be initialised | ||
with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` | ||
invocation. When possible, the pages will be recycled on future program | ||
invocations, to improve performance. Pages will generally be recycled a full | ||
batch at a time, except when a packet is dropped (by return code or because | ||
of, say, a redirection error), in which case that page will be recycled | ||
immediately. If a packet ends up being passed to the regular networking stack | ||
(because the XDP program returns ``XDP_PASS``, or because it ends up being | ||
redirected to an interface that injects it into the stack), the page will be | ||
released and a new one will be allocated when the pool is empty. | ||
|
||
When recycling, the page content is not rewritten; only the packet boundary | ||
pointers (``data``, ``data_end`` and ``data_meta``) in the context object will | ||
be reset to the original values. This means that if a program rewrites the | ||
packet contents, it has to be prepared to see either the original content or | ||
the modified version on subsequent invocations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
.. SPDX-License-Identifier: GPL-2.0 | ||
================================== | ||
Fprobe - Function entry/exit probe | ||
================================== | ||
|
||
.. Author: Masami Hiramatsu <mhiramat@kernel.org> | ||
Introduction | ||
============ | ||
|
||
Fprobe is a function entry/exit probe mechanism based on ftrace. | ||
Instead of using ftrace full feature, if you only want to attach callbacks | ||
on function entry and exit, similar to the kprobes and kretprobes, you can | ||
use fprobe. Compared with kprobes and kretprobes, fprobe gives faster | ||
instrumentation for multiple functions with single handler. This document | ||
describes how to use fprobe. | ||
|
||
The usage of fprobe | ||
=================== | ||
|
||
The fprobe is a wrapper of ftrace (+ kretprobe-like return callback) to | ||
attach callbacks to multiple function entry and exit. User needs to set up | ||
the `struct fprobe` and pass it to `register_fprobe()`. | ||
|
||
Typically, `fprobe` data structure is initialized with the `entry_handler` | ||
and/or `exit_handler` as below. | ||
|
||
.. code-block:: c | ||
struct fprobe fp = { | ||
.entry_handler = my_entry_callback, | ||
.exit_handler = my_exit_callback, | ||
}; | ||
To enable the fprobe, call one of register_fprobe(), register_fprobe_ips(), and | ||
register_fprobe_syms(). These functions register the fprobe with different types | ||
of parameters. | ||
|
||
The register_fprobe() enables a fprobe by function-name filters. | ||
E.g. this enables @fp on "func*()" function except "func2()".:: | ||
|
||
register_fprobe(&fp, "func*", "func2"); | ||
|
||
The register_fprobe_ips() enables a fprobe by ftrace-location addresses. | ||
E.g. | ||
|
||
.. code-block:: c | ||
unsigned long ips[] = { 0x.... }; | ||
register_fprobe_ips(&fp, ips, ARRAY_SIZE(ips)); | ||
And the register_fprobe_syms() enables a fprobe by symbol names. | ||
E.g. | ||
|
||
.. code-block:: c | ||
char syms[] = {"func1", "func2", "func3"}; | ||
register_fprobe_syms(&fp, syms, ARRAY_SIZE(syms)); | ||
To disable (remove from functions) this fprobe, call:: | ||
|
||
unregister_fprobe(&fp); | ||
|
||
You can temporally (soft) disable the fprobe by:: | ||
|
||
disable_fprobe(&fp); | ||
|
||
and resume by:: | ||
|
||
enable_fprobe(&fp); | ||
|
||
The above is defined by including the header:: | ||
|
||
#include <linux/fprobe.h> | ||
|
||
Same as ftrace, the registered callbacks will start being called some time | ||
after the register_fprobe() is called and before it returns. See | ||
:file:`Documentation/trace/ftrace.rst`. | ||
|
||
Also, the unregister_fprobe() will guarantee that the both enter and exit | ||
handlers are no longer being called by functions after unregister_fprobe() | ||
returns as same as unregister_ftrace_function(). | ||
|
||
The fprobe entry/exit handler | ||
============================= | ||
|
||
The prototype of the entry/exit callback function is as follows: | ||
|
||
.. code-block:: c | ||
void callback_func(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); | ||
Note that both entry and exit callbacks have same ptototype. The @entry_ip is | ||
saved at function entry and passed to exit handler. | ||
|
||
@fp | ||
This is the address of `fprobe` data structure related to this handler. | ||
You can embed the `fprobe` to your data structure and get it by | ||
container_of() macro from @fp. The @fp must not be NULL. | ||
|
||
@entry_ip | ||
This is the ftrace address of the traced function (both entry and exit). | ||
Note that this may not be the actual entry address of the function but | ||
the address where the ftrace is instrumented. | ||
|
||
@regs | ||
This is the `pt_regs` data structure at the entry and exit. Note that | ||
the instruction pointer of @regs may be different from the @entry_ip | ||
in the entry_handler. If you need traced instruction pointer, you need | ||
to use @entry_ip. On the other hand, in the exit_handler, the instruction | ||
pointer of @regs is set to the currect return address. | ||
|
||
Share the callbacks with kprobes | ||
================================ | ||
|
||
Since the recursion safeness of the fprobe (and ftrace) is a bit different | ||
from the kprobes, this may cause an issue if user wants to run the same | ||
code from the fprobe and the kprobes. | ||
|
||
Kprobes has per-cpu 'current_kprobe' variable which protects the kprobe | ||
handler from recursion in all cases. On the other hand, fprobe uses | ||
only ftrace_test_recursion_trylock(). This allows interrupt context to | ||
call another (or same) fprobe while the fprobe user handler is running. | ||
|
||
This is not a matter if the common callback code has its own recursion | ||
detection, or it can handle the recursion in the different contexts | ||
(normal/interrupt/NMI.) | ||
But if it relies on the 'current_kprobe' recursion lock, it has to check | ||
kprobe_running() and use kprobe_busy_*() APIs. | ||
|
||
Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback | ||
code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED | ||
*before* registering the fprobe, like: | ||
|
||
.. code-block:: c | ||
fprobe.flags = FPROBE_FL_KPROBE_SHARED; | ||
register_fprobe(&fprobe, "func*", NULL); | ||
This will protect your common callback from the nested call. | ||
|
||
The missed counter | ||
================== | ||
|
||
The `fprobe` data structure has `fprobe::nmissed` counter field as same as | ||
kprobes. | ||
This counter counts up when; | ||
|
||
- fprobe fails to take ftrace_recursion lock. This usually means that a function | ||
which is traced by other ftrace users is called from the entry_handler. | ||
|
||
- fprobe fails to setup the function exit because of the shortage of rethook | ||
(the shadow stack for hooking the function return.) | ||
|
||
The `fprobe::nmissed` field counts up in both cases. Therefore, the former | ||
skips both of entry and exit callback and the latter skips the exit | ||
callback, but in both case the counter will increase by 1. | ||
|
||
Note that if you set the FTRACE_OPS_FL_RECURSION and/or FTRACE_OPS_FL_RCU to | ||
`fprobe::ops::flags` (ftrace_ops::flags) when registering the fprobe, this | ||
counter may not work correctly, because ftrace skips the fprobe function which | ||
increase the counter. | ||
|
||
|
||
Functions and structures | ||
======================== | ||
|
||
.. kernel-doc:: include/linux/fprobe.h | ||
.. kernel-doc:: kernel/trace/fprobe.c | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.