Skip to content

Commit

Permalink
Daniel Borkmann says:
Browse files Browse the repository at this point in the history
====================
pull-request: bpf-next 2023-02-11

We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).

There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e5 ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:

        [...]
        ice_set_netdev_features(netdev);
        netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
                               NETDEV_XDP_ACT_XSK_ZEROCOPY;
        ice_set_ops(netdev);
        [...]

Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/

The main changes are:

1) Add support for BPF trampoline on s390x which finally allows to remove many
   test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.

2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.

3) Add capability to export the XDP features supported by the NIC.
   Along with that, add a XDP compliance test tool,
   from Lorenzo Bianconi & Marek Majtyka.

4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
   from David Vernet.

5) Add a deep dive documentation about the verifier's register
   liveness tracking algorithm, from Eduard Zingerman.

6) Fix and follow-up cleanups for resolve_btfids to be compiled
   as a host program to avoid cross compile issues,
   from Jiri Olsa & Ian Rogers.

7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
   when testing on different NICs, from Jesper Dangaard Brouer.

8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.

9) Extend libbpf to add an option for when the perf buffer should
   wake up, from Jon Doron.

10) Follow-up fix on xdp_metadata selftest to just consume on TX
    completion, from Stanislav Fomichev.

11) Extend the kfuncs.rst document with description on kfunc
    lifecycle & stability expectations, from David Vernet.

12) Fix bpftool prog profile to skip attaching to offline CPUs,
    from Tonghao Zhang.

====================

Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Jakub Kicinski committed Feb 11, 2023
2 parents d12f9ad + 17bcd27 commit de42873
Show file tree
Hide file tree
Showing 152 changed files with 4,884 additions and 962 deletions.
25 changes: 18 additions & 7 deletions Documentation/bpf/bpf_design_QA.rst
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,10 @@ data structures and compile with kernel internal headers. Both of these
kernel internals are subject to change and can break with newer kernels
such that the program needs to be adapted accordingly.

New BPF functionality is generally added through the use of kfuncs instead of
new helpers. Kfuncs are not considered part of the stable API, and have their own
lifecycle expectations as described in :ref:`BPF_kfunc_lifecycle_expectations`.

Q: Are tracepoints part of the stable ABI?
------------------------------------------
A: NO. Tracepoints are tied to internal implementation details hence they are
Expand Down Expand Up @@ -236,8 +240,8 @@ A: NO. Classic BPF programs are converted into extend BPF instructions.

Q: Can BPF call arbitrary kernel functions?
-------------------------------------------
A: NO. BPF programs can only call a set of helper functions which
is defined for every program type.
A: NO. BPF programs can only call specific functions exposed as BPF helpers or
kfuncs. The set of available functions is defined for every program type.

Q: Can BPF overwrite arbitrary kernel memory?
---------------------------------------------
Expand All @@ -263,7 +267,12 @@ Q: New functionality via kernel modules?
Q: Can BPF functionality such as new program or map types, new
helpers, etc be added out of kernel module code?

A: NO.
A: Yes, through kfuncs and kptrs

The core BPF functionality such as program types, maps and helpers cannot be
added to by modules. However, modules can expose functionality to BPF programs
by exporting kfuncs (which may return pointers to module-internal data
structures as kptrs).

Q: Directly calling kernel function is an ABI?
----------------------------------------------
Expand All @@ -278,7 +287,8 @@ kernel functions have already been used by other kernel tcp
cc (congestion-control) implementations. If any of these kernel
functions has changed, both the in-tree and out-of-tree kernel tcp cc
implementations have to be changed. The same goes for the bpf
programs and they have to be adjusted accordingly.
programs and they have to be adjusted accordingly. See
:ref:`BPF_kfunc_lifecycle_expectations` for details.

Q: Attaching to arbitrary kernel functions is an ABI?
-----------------------------------------------------
Expand Down Expand Up @@ -340,6 +350,7 @@ compatibility for these features?

A: NO.

Unlike map value types, there are no stability guarantees for this case. The
whole API to work with allocated objects and any support for special fields
inside them is unstable (since it is exposed through kfuncs).
Unlike map value types, the API to work with allocated objects and any support
for special fields inside them is exposed through kfuncs, and thus has the same
lifecycle expectations as the kfuncs themselves. See
:ref:`BPF_kfunc_lifecycle_expectations` for details.
120 changes: 84 additions & 36 deletions Documentation/bpf/instruction-set.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ eBPF Instruction Set Specification, v1.0

This document specifies version 1.0 of the eBPF instruction set.

Documentation conventions
=========================

For brevity, this document uses the type notion "u64", "u32", etc.
to mean an unsigned integer whose width is the specified number of bits.

Registers and calling convention
================================
Expand All @@ -30,20 +35,56 @@ Instruction encoding
eBPF has two instruction encodings:

* the basic instruction encoding, which uses 64 bits to encode an instruction
* the wide instruction encoding, which appends a second 64-bit immediate value
(imm64) after the basic instruction for a total of 128 bits.
* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
constant) value after the basic instruction for a total of 128 bits.

The basic instruction encoding is as follows, where MSB and LSB mean the most significant
bits and least significant bits, respectively:

============= ======= ======= ======= ============
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
============= ======= ======= ======= ============
imm offset src_reg dst_reg opcode
============= ======= ======= ======= ============

**imm**
signed integer immediate value

The basic instruction encoding looks as follows:
**offset**
signed integer offset used with pointer arithmetic

============= ======= =============== ==================== ============
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
============= ======= =============== ==================== ============
immediate offset source register destination register opcode
============= ======= =============== ==================== ============
**src_reg**
the source register number (0-10), except where otherwise specified
(`64-bit immediate instructions`_ reuse this field for other purposes)

**dst_reg**
destination register number (0-10)

**opcode**
operation to perform

Note that most instructions do not use all of the fields.
Unused fields shall be cleared to zero.

As discussed below in `64-bit immediate instructions`_, a 64-bit immediate
instruction uses a 64-bit immediate value that is constructed as follows.
The 64 bits following the basic instruction contain a pseudo instruction
using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
and imm containing the high 32 bits of the immediate value.

================= ==================
64 bits (MSB) 64 bits (LSB)
================= ==================
basic instruction pseudo instruction
================= ==================

Thus the 64-bit immediate value is constructed as follows:

imm64 = (next_imm << 32) | imm

where 'next_imm' refers to the imm value of the pseudo instruction
following the basic instruction.

Instruction classes
-------------------

Expand Down Expand Up @@ -71,27 +112,32 @@ For arithmetic and jump instructions (``BPF_ALU``, ``BPF_ALU64``, ``BPF_JMP`` an
============== ====== =================
4 bits (MSB) 1 bit 3 bits (LSB)
============== ====== =================
operation code source instruction class
code source instruction class
============== ====== =================

The 4th bit encodes the source operand:
**code**
the operation code, whose meaning varies by instruction class

====== ===== ========================================
source value description
====== ===== ========================================
BPF_K 0x00 use 32-bit immediate as source operand
BPF_X 0x08 use 'src_reg' register as source operand
====== ===== ========================================
**source**
the source operand location, which unless otherwise specified is one of:

The four MSB bits store the operation code.
====== ===== ==============================================
source value description
====== ===== ==============================================
BPF_K 0x00 use 32-bit 'imm' value as source operand
BPF_X 0x08 use 'src_reg' register value as source operand
====== ===== ==============================================

**instruction class**
the instruction class (see `Instruction classes`_)

Arithmetic instructions
-----------------------

``BPF_ALU`` uses 32-bit wide operands while ``BPF_ALU64`` uses 64-bit wide operands for
otherwise identical operations.
The 'code' field encodes the operation as below:
The 'code' field encodes the operation as below, where 'src' and 'dst' refer
to the values of the source and destination registers, respectively.

======== ===== ==========================================================
code value description
Expand Down Expand Up @@ -121,19 +167,21 @@ the destination register is unchanged whereas for ``BPF_ALU`` the upper

``BPF_ADD | BPF_X | BPF_ALU`` means::

dst_reg = (u32) dst_reg + (u32) src_reg;
dst = (u32) ((u32) dst + (u32) src)

where '(u32)' indicates that the upper 32 bits are zeroed.

``BPF_ADD | BPF_X | BPF_ALU64`` means::

dst_reg = dst_reg + src_reg
dst = dst + src

``BPF_XOR | BPF_K | BPF_ALU`` means::

dst_reg = (u32) dst_reg ^ (u32) imm32
dst = (u32) dst ^ (u32) imm32

``BPF_XOR | BPF_K | BPF_ALU64`` means::

dst_reg = dst_reg ^ imm32
dst = dst ^ imm32

Also note that the division and modulo operations are unsigned. Thus, for
``BPF_ALU``, 'imm' is first interpreted as an unsigned 32-bit value, whereas
Expand Down Expand Up @@ -167,11 +215,11 @@ Examples:

``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::

dst_reg = htole16(dst_reg)
dst = htole16(dst)

``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::

dst_reg = htobe64(dst_reg)
dst = htobe64(dst)

Jump instructions
-----------------
Expand Down Expand Up @@ -246,15 +294,15 @@ instructions that transfer data between a register and memory.

``BPF_MEM | <size> | BPF_STX`` means::

*(size *) (dst_reg + off) = src_reg
*(size *) (dst + offset) = src

``BPF_MEM | <size> | BPF_ST`` means::

*(size *) (dst_reg + off) = imm32
*(size *) (dst + offset) = imm32

``BPF_MEM | <size> | BPF_LDX`` means::

dst_reg = *(size *) (src_reg + off)
dst = *(size *) (src + offset)

Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``.

Expand Down Expand Up @@ -288,11 +336,11 @@ BPF_XOR 0xa0 atomic xor

``BPF_ATOMIC | BPF_W | BPF_STX`` with 'imm' = BPF_ADD means::

*(u32 *)(dst_reg + off16) += src_reg
*(u32 *)(dst + offset) += src

``BPF_ATOMIC | BPF_DW | BPF_STX`` with 'imm' = BPF ADD means::

*(u64 *)(dst_reg + off16) += src_reg
*(u64 *)(dst + offset) += src

In addition to the simple atomic operations, there also is a modifier and
two complex atomic operations:
Expand All @@ -307,16 +355,16 @@ BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange

The ``BPF_FETCH`` modifier is optional for simple atomic operations, and
always set for the complex atomic operations. If the ``BPF_FETCH`` flag
is set, then the operation also overwrites ``src_reg`` with the value that
is set, then the operation also overwrites ``src`` with the value that
was in memory before it was modified.

The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value
addressed by ``dst_reg + off``.
The ``BPF_XCHG`` operation atomically exchanges ``src`` with the value
addressed by ``dst + offset``.

The ``BPF_CMPXCHG`` operation atomically compares the value addressed by
``dst_reg + off`` with ``R0``. If they match, the value addressed by
``dst_reg + off`` is replaced with ``src_reg``. In either case, the
value that was at ``dst_reg + off`` before the operation is zero-extended
``dst + offset`` with ``R0``. If they match, the value addressed by
``dst + offset`` is replaced with ``src``. In either case, the
value that was at ``dst + offset`` before the operation is zero-extended
and loaded back to ``R0``.

64-bit immediate instructions
Expand All @@ -329,7 +377,7 @@ There is currently only one such instruction.

``BPF_LD | BPF_DW | BPF_IMM`` means::

dst_reg = imm64
dst = imm64


Legacy BPF Packet access instructions
Expand Down
Loading

0 comments on commit de42873

Please sign in to comment.