Skip to content

Commit

Permalink
Daniel Borkmann says:
Browse files Browse the repository at this point in the history
====================
pull-request: bpf-next 2023-04-13

We've added 260 non-merge commits during the last 36 day(s) which contain
a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).

The main changes are:

1) Rework BPF verifier log behavior and implement it as a rotating log
   by default with the option to retain old-style fixed log behavior,
   from Andrii Nakryiko.

2) Adds support for using {FOU,GUE} encap with an ipip device operating
   in collect_md mode and add a set of BPF kfuncs for controlling encap
   params, from Christian Ehrig.

3) Allow BPF programs to detect at load time whether a particular kfunc
   exists or not, and also add support for this in light skeleton,
   from Alexei Starovoitov.

4) Optimize hashmap lookups when key size is multiple of 4,
   from Anton Protopopov.

5) Enable RCU semantics for task BPF kptrs and allow referenced kptr
   tasks to be stored in BPF maps, from David Vernet.

6) Add support for stashing local BPF kptr into a map value via
   bpf_kptr_xchg(). This is useful e.g. for rbtree node creation
   for new cgroups, from Dave Marchevsky.

7) Fix BTF handling of is_int_ptr to skip modifiers to work around
   tracing issues where a program cannot be attached, from Feng Zhou.

8) Migrate a big portion of test_verifier unit tests over to
   test_progs -a verifier_* via inline asm to ease {read,debug}ability,
   from Eduard Zingerman.

9) Several updates to the instruction-set.rst documentation
   which is subject to future IETF standardization
   (https://lwn.net/Articles/926882/), from Dave Thaler.

10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register
    known bits information propagation, from Daniel Borkmann.

11) Add skb bitfield compaction work related to BPF with the overall goal
    to make more of the sk_buff bits optional, from Jakub Kicinski.

12) BPF selftest cleanups for build id extraction which stand on its own
    from the upcoming integration work of build id into struct file object,
    from Jiri Olsa.

13) Add fixes and optimizations for xsk descriptor validation and several
    selftest improvements for xsk sockets, from Kal Conley.

14) Add BPF links for struct_ops and enable switching implementations
    of BPF TCP cong-ctls under a given name by replacing backing
    struct_ops map, from Kui-Feng Lee.

15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable
    offset stack read as earlier Spectre checks cover this,
    from Luis Gerhorst.

16) Fix issues in copy_from_user_nofault() for BPF and other tracers
    to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner
    and Alexei Starovoitov.

17) Add --json-summary option to test_progs in order for CI tooling to
    ease parsing of test results, from Manu Bretelle.

18) Batch of improvements and refactoring to prep for upcoming
    bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator,
    from Martin KaFai Lau.

19) Improve bpftool's visual program dump which produces the control
    flow graph in a DOT format by adding C source inline annotations,
    from Quentin Monnet.

20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting
    the module name from BTF of the target and searching kallsyms of
    the correct module, from Viktor Malik.

21) Improve BPF verifier handling of '<const> <cond> <non_const>'
    to better detect whether in particular jmp32 branches are taken,
    from Yonghong Song.

22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock.
    A built-in cc or one from a kernel module is already able to write
    to app_limited, from Yixin Shen.

Conflicts:

Documentation/bpf/bpf_devel_QA.rst
  b7abcd9 ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
  0f10f64 ("bpf, docs: Use internal linking for link to netdev subsystem doc")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/

include/net/ip_tunnels.h
  bc9d003 ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts")
  ac931d4 ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices")
https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/

net/bpf/test_run.c
  e5995bc ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
  294635a ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES")
https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/
====================

Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Jakub Kicinski committed Apr 13, 2023
2 parents 800e68c + 8c5c2a4 commit c2865b1
Show file tree
Hide file tree
Showing 356 changed files with 21,786 additions and 11,274 deletions.
20 changes: 12 additions & 8 deletions Documentation/bpf/bpf_devel_QA.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,8 @@ into the bpf-next tree will make their way into net-next tree. net and
net-next are both run by David S. Miller. From there, they will go
into the kernel mainline tree run by Linus Torvalds. To read up on the
process of net and net-next being merged into the mainline tree, see
the `netdev-FAQ`_.
the documentation on netdev subsystem at
Documentation/process/maintainer-netdev.rst.



Expand All @@ -147,7 +148,8 @@ request)::
Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to?
---------------------------------------------------------------------------------

A: The process is the very same as described in the `netdev-FAQ`_,
A: The process is the very same as described in the netdev subsystem
documentation at Documentation/process/maintainer-netdev.rst,
so please read up on it. The subject line must indicate whether the
patch is a fix or rather "next-like" content in order to let the
maintainers know whether it is targeted at bpf or bpf-next.
Expand Down Expand Up @@ -206,8 +208,9 @@ ii) run extensive BPF test suite and
Once the BPF pull request was accepted by David S. Miller, then
the patches end up in net or net-next tree, respectively, and
make their way from there further into mainline. Again, see the
`netdev-FAQ`_ for additional information e.g. on how often they are
merged to mainline.
documentation for netdev subsystem at
Documentation/process/maintainer-netdev.rst for additional information
e.g. on how often they are merged to mainline.

Q: How long do I need to wait for feedback on my BPF patches?
-------------------------------------------------------------
Expand All @@ -230,7 +233,8 @@ Q: Are patches applied to bpf-next when the merge window is open?
-----------------------------------------------------------------
A: For the time when the merge window is open, bpf-next will not be
processed. This is roughly analogous to net-next patch processing,
so feel free to read up on the `netdev-FAQ`_ about further details.
so feel free to read up on the netdev docs at
Documentation/process/maintainer-netdev.rst about further details.

During those two weeks of merge window, we might ask you to resend
your patch series once bpf-next is open again. Once Linus released
Expand Down Expand Up @@ -394,7 +398,8 @@ netdev kernel mailing list in Cc and ask for the fix to be queued up:
netdev@vger.kernel.org

The process in general is the same as on netdev itself, see also the
`netdev-FAQ`_.
the documentation on networking subsystem at
Documentation/process/maintainer-netdev.rst.

Q: Do you also backport to kernels not currently maintained as stable?
----------------------------------------------------------------------
Expand All @@ -410,7 +415,7 @@ Q: The BPF patch I am about to submit needs to go to stable as well
What should I do?

A: The same rules apply as with netdev patch submissions in general, see
the `netdev-FAQ`_.
the netdev docs at Documentation/process/maintainer-netdev.rst.

Never add "``Cc: stable@vger.kernel.org``" to the patch description, but
ask the BPF maintainers to queue the patches instead. This can be done
Expand Down Expand Up @@ -684,7 +689,6 @@ when:


.. Links
.. _netdev-FAQ: https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html
.. _selftests:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/

Expand Down
6 changes: 6 additions & 0 deletions Documentation/bpf/clang-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ Arithmetic instructions
For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included.

Jump instructions
=================

If ``-O0`` is used, Clang will generate the ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d)
instruction, which is not supported by the Linux kernel verifier.

Atomic operations
=================

Expand Down
30 changes: 10 additions & 20 deletions Documentation/bpf/cpumasks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,12 +117,7 @@ For example:
As mentioned and illustrated above, these ``struct bpf_cpumask *`` objects can
also be stored in a map and used as kptrs. If a ``struct bpf_cpumask *`` is in
a map, the reference can be removed from the map with bpf_kptr_xchg(), or
opportunistically acquired with bpf_cpumask_kptr_get():

.. kernel-doc:: kernel/bpf/cpumask.c
:identifiers: bpf_cpumask_kptr_get

Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
opportunistically acquired using RCU:

.. code-block:: c
Expand All @@ -144,7 +139,7 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
/**
* A simple example tracepoint program showing how a
* struct bpf_cpumask * kptr that is stored in a map can
* be acquired using the bpf_cpumask_kptr_get() kfunc.
* be passed to kfuncs using RCU protection.
*/
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
Expand All @@ -158,26 +153,21 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
if (!v)
return -ENOENT;
bpf_rcu_read_lock();
/* Acquire a reference to the bpf_cpumask * kptr that's already stored in the map. */
kptr = bpf_cpumask_kptr_get(&v->cpumask);
if (!kptr)
kptr = v->cpumask;
if (!kptr) {
/* If no bpf_cpumask was present in the map, it's because
* we're racing with another CPU that removed it with
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
* above, and our call to bpf_cpumask_kptr_get().
* bpf_cpumask_kptr_get() internally safely handles this
* race, and will return NULL if the cpumask is no longer
* present in the map by the time we invoke the kfunc.
* above, and our load of the pointer from the map.
*/
bpf_rcu_read_unlock();
return -EBUSY;
}
/* Free the reference we just took above. Note that the
* original struct bpf_cpumask * kptr is still in the map. It will
* be freed either at a later time if another context deletes
* it from the map, or automatically by the BPF subsystem if
* it's still present when the map is destroyed.
*/
bpf_cpumask_release(kptr);
bpf_cpumask_setall(kptr);
bpf_rcu_read_unlock();
return 0;
}
Expand Down
129 changes: 101 additions & 28 deletions Documentation/bpf/instruction-set.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ Documentation conventions
=========================

For brevity, this document uses the type notion "u64", "u32", etc.
to mean an unsigned integer whose width is the specified number of bits.
to mean an unsigned integer whose width is the specified number of bits,
and "s32", etc. to mean a signed integer of the specified number of bits.

Registers and calling convention
================================
Expand Down Expand Up @@ -242,28 +243,58 @@ Jump instructions
otherwise identical operations.
The 'code' field encodes the operation as below:

======== ===== ========================= ============
code value description notes
======== ===== ========================= ============
BPF_JA 0x00 PC += off BPF_JMP only
BPF_JEQ 0x10 PC += off if dst == src
BPF_JGT 0x20 PC += off if dst > src unsigned
BPF_JGE 0x30 PC += off if dst >= src unsigned
BPF_JSET 0x40 PC += off if dst & src
BPF_JNE 0x50 PC += off if dst != src
BPF_JSGT 0x60 PC += off if dst > src signed
BPF_JSGE 0x70 PC += off if dst >= src signed
BPF_CALL 0x80 function call
BPF_EXIT 0x90 function / program return BPF_JMP only
BPF_JLT 0xa0 PC += off if dst < src unsigned
BPF_JLE 0xb0 PC += off if dst <= src unsigned
BPF_JSLT 0xc0 PC += off if dst < src signed
BPF_JSLE 0xd0 PC += off if dst <= src signed
======== ===== ========================= ============
======== ===== === =========================================== =========================================
code value src description notes
======== ===== === =========================================== =========================================
BPF_JA 0x0 0x0 PC += offset BPF_JMP only
BPF_JEQ 0x1 any PC += offset if dst == src
BPF_JGT 0x2 any PC += offset if dst > src unsigned
BPF_JGE 0x3 any PC += offset if dst >= src unsigned
BPF_JSET 0x4 any PC += offset if dst & src
BPF_JNE 0x5 any PC += offset if dst != src
BPF_JSGT 0x6 any PC += offset if dst > src signed
BPF_JSGE 0x7 any PC += offset if dst >= src signed
BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_
BPF_CALL 0x8 0x1 call PC += offset see `Program-local functions`_
BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_
BPF_EXIT 0x9 0x0 return BPF_JMP only
BPF_JLT 0xa any PC += offset if dst < src unsigned
BPF_JLE 0xb any PC += offset if dst <= src unsigned
BPF_JSLT 0xc any PC += offset if dst < src signed
BPF_JSLE 0xd any PC += offset if dst <= src signed
======== ===== === =========================================== =========================================

The eBPF program needs to store the return value into register R0 before doing a
BPF_EXIT.
``BPF_EXIT``.

Example:

``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means::

if (s32)dst s>= (s32)src goto +offset

where 's>=' indicates a signed '>=' comparison.

Helper functions
~~~~~~~~~~~~~~~~

Helper functions are a concept whereby BPF programs can call into a
set of function calls exposed by the underlying platform.

Historically, each helper function was identified by an address
encoded in the imm field. The available helper functions may differ
for each program type, but address values are unique across all program types.

Platforms that support the BPF Type Format (BTF) support identifying
a helper function by a BTF ID encoded in the imm field, where the BTF ID
identifies the helper name and type.

Program-local functions
~~~~~~~~~~~~~~~~~~~~~~~
Program-local functions are functions exposed by the same BPF program as the
caller, and are referenced by offset from the call instruction, similar to
``BPF_JA``. A ``BPF_EXIT`` within the program-local function will return to
the caller.

Load and store instructions
===========================
Expand Down Expand Up @@ -385,14 +416,56 @@ and loaded back to ``R0``.
-----------------------------

Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
encoding for an extra imm64 value.

There is currently only one such instruction.

``BPF_LD | BPF_DW | BPF_IMM`` means::

dst = imm64

encoding defined in `Instruction encoding`_, and use the 'src' field of the
basic instruction to hold an opcode subtype.

The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions
with opcode subtypes in the 'src' field, using new terms such as "map"
defined further below:

========================= ====== === ========================================= =========== ==============
opcode construction opcode src pseudocode imm type dst type
========================= ====== === ========================================= =========== ==============
BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = imm64 integer integer
BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map
BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer
BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer
BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer
BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map
BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer
========================= ====== === ========================================= =========== ==============

where

* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
* map_by_idx(imm) means to convert a 32-bit index into an address of a map
* map_val(map) gets the address of the first value in a given map
* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
* the 'imm type' can be used by disassemblers for display
* the 'dst type' can be used for verification and JIT compilation purposes

Maps
~~~~

Maps are shared memory regions accessible by eBPF programs on some platforms.
A map can have various semantics as defined in a separate document, and may or
may not have a single contiguous memory region, but the 'map_val(map)' is
currently only defined for maps that do have a single contiguous memory region.

Each map can have a file descriptor (fd) if supported by the platform, where
'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
BPF program can also be defined to use a set of maps associated with the
program at load time, and 'map_by_idx(imm)' means to get the map with the given
index in the set associated with the BPF program containing the instruction.

Platform Variables
~~~~~~~~~~~~~~~~~~

Platform variables are memory regions, identified by integer ids, exposed by
the runtime and accessible by BPF programs on some platforms. The
'var_addr(imm)' operation means to get the address of the memory region
identified by the given id.

Legacy BPF Packet access instructions
-------------------------------------
Expand Down
Loading

0 comments on commit c2865b1

Please sign in to comment.