Skip to content

Commit

Permalink
Merge branch 'Atomics for eBPF'
Browse files Browse the repository at this point in the history
Brendan Jackman says:

====================

There's still one unresolved review comment from John[3] which I
will resolve with a followup patch.

Differences from v6->v7 [1]:

* Fixed riscv build error detected by 0-day robot.

Differences from v5->v6 [1]:

* Carried Björn Töpel's ack for RISC-V code, plus a couple more acks from
  Yonhgong.

* Doc fixups.

* Trivial cleanups.

Differences from v4->v5 [1]:

* Fixed bogus type casts in interpreter that led to warnings from
  the 0day robot.

* Dropped feature-detection for Clang per Andrii's suggestion in [4].
  The selftests will now fail to build unless you have llvm-project
  commit 286daafd6512. The ENABLE_ATOMICS_TEST macro is still needed
  to support the no_alu32 tests.

* Carried some Acks from John and Yonghong.

* Dropped confusing usage of __atomic_exchange from prog_test in
  favour of __sync_lock_test_and_set.

* [Really] got rid of all the forest of instruction macros
  (BPF_ATOMIC_FETCH_ADD and friends); now there's just BPF_ATOMIC_OP
  to define all the instructions as we use them in the verifier
  tests. This makes the atomic ops less special in that API, and I
  don't think the resulting usage is actually any harder to read.

Differences from v3->v4 [1]:

* Added one Ack from Yonghong. He acked some other patches but those
  have now changed non-trivally so I didn't add those acks.

* Fixups to commit messages.

* Fixed disassembly and comments: first arg to atomic_fetch_* is a
  pointer.

* Improved prog_test efficiency. BPF progs are now all loaded in a
  single call, then the skeleton is re-used for each subtest.

* Dropped use of tools/build/feature in favour of a one-liner in the
  Makefile.

* Dropped the commit that created an emit_neg helper in the x86
  JIT. It's not used any more (it wasn't used in v3 either).

* Combined all the different filter.h macros (used to be
  BPF_ATOMIC_ADD, BPF_ATOMIC_FETCH_ADD, BPF_ATOMIC_AND, etc) into
  just BPF_ATOMIC32 and BPF_ATOMIC64.

* Removed some references to BPF_STX_XADD from tools/, samples/ and
  lib/ that I missed before.

Differences from v2->v3 [1]:

* More minor fixes and naming/comment changes

* Dropped atomic subtract: compilers can implement this by preceding
  an atomic add with a NEG instruction (which is what the x86 JIT did
  under the hood anyway).

* Dropped the use of -mcpu=v4 in the Clang BPF command-line; there is
  no longer an architecture version bump. Instead a feature test is
  added to Kbuild - it builds a source file to check if Clang
  supports BPF atomics.

* Fixed the prog_test so it no longer breaks
  test_progs-no_alu32. This requires some ifdef acrobatics to avoid
  complicating the prog_tests model where the same userspace code
  exercises both the normal and no_alu32 BPF test objects, using the
  same skeleton header.

Differences from v1->v2 [1]:

* Fixed mistakes in the netronome driver

* Addd sub, add, or, xor operations

* The above led to some refactors to keep things readable. (Maybe I
  should have just waited until I'd implemented these before starting
  the review...)

* Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
  include the BPF_FETCH flag

* Added a bit of documentation. Suggestions welcome for more places
  to dump this info...

The prog_test that's added depends on Clang/LLVM features added by
Yonghong in commit 286daafd6512 (was
https://reviews.llvm.org/D72184).

This only includes a JIT implementation for x86_64 - I don't plan to
implement JIT support myself for other architectures.

Operations
==========

This patchset adds atomic operations to the eBPF instruction set. The
use-case that motivated this work was a trivial and efficient way to
generate globally-unique cookies in BPF progs, but I think it's
obvious that these features are pretty widely applicable.  The
instructions that are added here can be summarised with this list of
kernel operations:

* atomic[64]_[fetch_]add
* atomic[64]_[fetch_]and
* atomic[64]_[fetch_]or
* atomic[64]_xchg
* atomic[64]_cmpxchg

The following are left out of scope for this effort:

* 16 and 8 bit operations
* Explicit memory barriers

Encoding
========

I originally planned to add new values for bpf_insn.opcode. This was
rather unpleasant: the opcode space has holes in it but no entire
instruction classes[2]. Yonghong Song had a better idea: use the
immediate field of the existing STX XADD instruction to encode the
operation. This works nicely, without breaking existing programs,
because the immediate field is currently reserved-must-be-zero, and
extra-nicely because BPF_ADD happens to be zero.

Note that this of course makes immediate-source atomic operations
impossible. It's hard to imagine a measurable speedup from such
instructions, and if it existed it would certainly not benefit x86,
which has no support for them.

The BPF_OP opcode fields are re-used in the immediate, and an
additional flag BPF_FETCH is used to mark instructions that should
fetch a pre-modification value from memory.

So, BPF_XADD is now called BPF_ATOMIC (the old name is kept to avoid
breaking userspace builds), and where we previously had .imm = 0, we
now have .imm = BPF_ADD (which is 0).

Operands
========

Reg-source eBPF instructions only have two operands, while these
atomic operations have up to four. To avoid needing to encode
additional operands, then:

- One of the input registers is re-used as an output register
  (e.g. atomic_fetch_add both reads from and writes to the source
  register).

- Where necessary (i.e. for cmpxchg) , R0 is "hard-coded" as one of
  the operands.

This approach also allows the new eBPF instructions to map directly
to single x86 instructions.

[1] Previous iterations:
    v1: https://lore.kernel.org/bpf/20201123173202.1335708-1-jackmanb@google.com/
    v2: https://lore.kernel.org/bpf/20201127175738.1085417-1-jackmanb@google.com/
    v3: https://lore.kernel.org/bpf/X8kN7NA7bJC7aLQI@google.com/
    v4: https://lore.kernel.org/bpf/20201207160734.2345502-1-jackmanb@google.com/
    v5: https://lore.kernel.org/bpf/20201215121816.1048557-1-jackmanb@google.com/
    v6: https://lore.kernel.org/bpf/20210112154235.2192781-1-jackmanb@google.com/

[2] Visualisation of eBPF opcode space:
    https://gist.github.com/bjackman/00fdad2d5dfff601c1918bc29b16e778

[3] Comment from John about propagating bounds in verifier:
    https://lore.kernel.org/bpf/5fcf0fbcc8aa8_9ab320853@john-XPS-13-9370.notmuch/

[4] Mail from Andrii about not supporting old Clang in selftests:
    https://lore.kernel.org/bpf/CAEf4BzYBddPaEzRUs=jaWSo5kbf=LZdb7geAUVj85GxLQztuAQ@mail.gmail.com/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  • Loading branch information
Alexei Starovoitov committed Jan 15, 2021
2 parents bade5c5 + de94857 commit 7064a73
Show file tree
Hide file tree
Showing 44 changed files with 1,466 additions and 212 deletions.
61 changes: 50 additions & 11 deletions Documentation/networking/filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1006,13 +1006,13 @@ Size modifier is one of ...

Mode modifier is one of::

BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20
BPF_IND 0x40
BPF_MEM 0x60
BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
BPF_XADD 0xc0 /* eBPF only, exclusive add */
BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20
BPF_IND 0x40
BPF_MEM 0x60
BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
BPF_ATOMIC 0xc0 /* eBPF only, atomic operations */
eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
(BPF_IND | <size> | BPF_LD) which are used to access packet data.
Expand Down Expand Up @@ -1044,11 +1044,50 @@ Unlike classic BPF instruction set, eBPF has generic load/store operations::
BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
2 byte atomic increments are not supported.
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.

It also includes atomic operations, which use the immediate field for extra
encoding.

.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
The basic atomic operations supported are:

BPF_ADD
BPF_AND
BPF_OR
BPF_XOR

Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
memory location addresed by ``dst_reg + off`` is atomically modified, with
``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
immediate, then these operations also overwrite ``src_reg`` with the
value that was in memory before it was modified.

The more special operations are:

BPF_XCHG

This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
off``.

BPF_CMPXCHG

This atomically compares the value addressed by ``dst_reg + off`` with
``R0``. If they match it is replaced with ``src_reg``, The value that was there
before is loaded back to ``R0``.

Note that 1 and 2 byte atomic operations are not supported.

Except ``BPF_ADD`` _without_ ``BPF_FETCH`` (for legacy reasons), all 4 byte
atomic operations require alu32 mode. Clang enables this mode by default in
architecture v3 (``-mcpu=v3``). For older versions it can be enabled with
``-Xclang -target-feature -Xclang +alu32``.

You may encounter BPF_XADD - this is a legacy name for BPF_ATOMIC, referring to
the exclusive-add operation encoded when the immediate field is zero.

eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
Expand Down
7 changes: 3 additions & 4 deletions arch/arm/net/bpf_jit_32.c
Original file line number Diff line number Diff line change
Expand Up @@ -1620,10 +1620,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
}
emit_str_r(dst_lo, tmp2, off, ctx, BPF_SIZE(code));
break;
/* STX XADD: lock *(u32 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_W:
/* STX XADD: lock *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW:
/* Atomic ops */
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
goto notyet;
/* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_W:
Expand Down
16 changes: 12 additions & 4 deletions arch/arm64/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -875,10 +875,18 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
}
break;

/* STX XADD: lock *(u32 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_W:
/* STX XADD: lock *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW:
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm != BPF_ADD) {
pr_err_once("unknown atomic op code %02x\n", insn->imm);
return -EINVAL;
}

/* STX XADD: lock *(u32 *)(dst + off) += src
* and
* STX XADD: lock *(u64 *)(dst + off) += src
*/

if (!off) {
reg = dst;
} else {
Expand Down
11 changes: 8 additions & 3 deletions arch/mips/net/ebpf_jit.c
Original file line number Diff line number Diff line change
Expand Up @@ -1423,8 +1423,8 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
case BPF_STX | BPF_H | BPF_MEM:
case BPF_STX | BPF_W | BPF_MEM:
case BPF_STX | BPF_DW | BPF_MEM:
case BPF_STX | BPF_W | BPF_XADD:
case BPF_STX | BPF_DW | BPF_XADD:
case BPF_STX | BPF_W | BPF_ATOMIC:
case BPF_STX | BPF_DW | BPF_ATOMIC:
if (insn->dst_reg == BPF_REG_10) {
ctx->flags |= EBPF_SEEN_FP;
dst = MIPS_R_SP;
Expand All @@ -1438,7 +1438,12 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
if (src < 0)
return src;
if (BPF_MODE(insn->code) == BPF_XADD) {
if (BPF_MODE(insn->code) == BPF_ATOMIC) {
if (insn->imm != BPF_ADD) {
pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
return -EINVAL;
}

/*
* If mem_off does not fit within the 9 bit ll/sc
* instruction immediate field, use a temp reg.
Expand Down
25 changes: 20 additions & 5 deletions arch/powerpc/net/bpf_jit_comp64.c
Original file line number Diff line number Diff line change
Expand Up @@ -683,10 +683,18 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
break;

/*
* BPF_STX XADD (atomic_add)
* BPF_STX ATOMIC (atomic ops)
*/
/* *(u32 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_W:
if (insn->imm != BPF_ADD) {
pr_err_ratelimited(
"eBPF filter atomic op code %02x (@%d) unsupported\n",
code, i);
return -ENOTSUPP;
}

/* *(u32 *)(dst + off) += src */

/* Get EA into TMP_REG_1 */
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off));
tmp_idx = ctx->idx * 4;
Expand All @@ -699,8 +707,15 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
/* we're done if this succeeded */
PPC_BCC_SHORT(COND_NE, tmp_idx);
break;
/* *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm != BPF_ADD) {
pr_err_ratelimited(
"eBPF filter atomic op code %02x (@%d) unsupported\n",
code, i);
return -ENOTSUPP;
}
/* *(u64 *)(dst + off) += src */

EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off));
tmp_idx = ctx->idx * 4;
EMIT(PPC_RAW_LDARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0));
Expand Down
20 changes: 16 additions & 4 deletions arch/riscv/net/bpf_jit_comp32.c
Original file line number Diff line number Diff line change
Expand Up @@ -881,7 +881,7 @@ static int emit_store_r64(const s8 *dst, const s8 *src, s16 off,
const s8 *rd = bpf_get_reg64(dst, tmp1, ctx);
const s8 *rs = bpf_get_reg64(src, tmp2, ctx);

if (mode == BPF_XADD && size != BPF_W)
if (mode == BPF_ATOMIC && size != BPF_W)
return -1;

emit_imm(RV_REG_T0, off, ctx);
Expand All @@ -899,7 +899,7 @@ static int emit_store_r64(const s8 *dst, const s8 *src, s16 off,
case BPF_MEM:
emit(rv_sw(RV_REG_T0, 0, lo(rs)), ctx);
break;
case BPF_XADD:
case BPF_ATOMIC: /* Only BPF_ADD supported */
emit(rv_amoadd_w(RV_REG_ZERO, lo(rs), RV_REG_T0, 0, 0),
ctx);
break;
Expand Down Expand Up @@ -1260,7 +1260,6 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
case BPF_STX | BPF_MEM | BPF_H:
case BPF_STX | BPF_MEM | BPF_W:
case BPF_STX | BPF_MEM | BPF_DW:
case BPF_STX | BPF_XADD | BPF_W:
if (BPF_CLASS(code) == BPF_ST) {
emit_imm32(tmp2, imm, ctx);
src = tmp2;
Expand All @@ -1271,8 +1270,21 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
return -1;
break;

case BPF_STX | BPF_ATOMIC | BPF_W:
if (insn->imm != BPF_ADD) {
pr_info_once(
"bpf-jit: not supported: atomic operation %02x ***\n",
insn->imm);
return -EFAULT;
}

if (emit_store_r64(dst, src, off, ctx, BPF_SIZE(code),
BPF_MODE(code)))
return -1;
break;

/* No hardware support for 8-byte atomics in RV32. */
case BPF_STX | BPF_XADD | BPF_DW:
case BPF_STX | BPF_ATOMIC | BPF_DW:
/* Fallthrough. */

notsupported:
Expand Down
16 changes: 12 additions & 4 deletions arch/riscv/net/bpf_jit_comp64.c
Original file line number Diff line number Diff line change
Expand Up @@ -1027,10 +1027,18 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
emit_sd(RV_REG_T1, 0, rs, ctx);
break;
/* STX XADD: lock *(u32 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_W:
/* STX XADD: lock *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW:
case BPF_STX | BPF_ATOMIC | BPF_W:
case BPF_STX | BPF_ATOMIC | BPF_DW:
if (insn->imm != BPF_ADD) {
pr_err("bpf-jit: not supported: atomic operation %02x ***\n",
insn->imm);
return -EINVAL;
}

/* atomic_add: lock *(u32 *)(dst + off) += src
* atomic_add: lock *(u64 *)(dst + off) += src
*/

if (off) {
if (is_12b_int(off)) {
emit_addi(RV_REG_T1, rd, off, ctx);
Expand Down
27 changes: 16 additions & 11 deletions arch/s390/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -1205,18 +1205,23 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
jit->seen |= SEEN_MEM;
break;
/*
* BPF_STX XADD (atomic_add)
* BPF_ATOMIC
*/
case BPF_STX | BPF_XADD | BPF_W: /* *(u32 *)(dst + off) += src */
/* laal %w0,%src,off(%dst) */
EMIT6_DISP_LH(0xeb000000, 0x00fa, REG_W0, src_reg,
dst_reg, off);
jit->seen |= SEEN_MEM;
break;
case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */
/* laalg %w0,%src,off(%dst) */
EMIT6_DISP_LH(0xeb000000, 0x00ea, REG_W0, src_reg,
dst_reg, off);
case BPF_STX | BPF_ATOMIC | BPF_DW:
case BPF_STX | BPF_ATOMIC | BPF_W:
if (insn->imm != BPF_ADD) {
pr_err("Unknown atomic operation %02x\n", insn->imm);
return -1;
}

/* *(u32/u64 *)(dst + off) += src
*
* BFW_W: laal %w0,%src,off(%dst)
* BPF_DW: laalg %w0,%src,off(%dst)
*/
EMIT6_DISP_LH(0xeb000000,
BPF_SIZE(insn->code) == BPF_W ? 0x00fa : 0x00ea,
REG_W0, src_reg, dst_reg, off);
jit->seen |= SEEN_MEM;
break;
/*
Expand Down
17 changes: 14 additions & 3 deletions arch/sparc/net/bpf_jit_comp_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -1366,12 +1366,18 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
break;
}

/* STX XADD: lock *(u32 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_W: {
case BPF_STX | BPF_ATOMIC | BPF_W: {
const u8 tmp = bpf2sparc[TMP_REG_1];
const u8 tmp2 = bpf2sparc[TMP_REG_2];
const u8 tmp3 = bpf2sparc[TMP_REG_3];

if (insn->imm != BPF_ADD) {
pr_err_once("unknown atomic op %02x\n", insn->imm);
return -EINVAL;
}

/* lock *(u32 *)(dst + off) += src */

if (insn->dst_reg == BPF_REG_FP)
ctx->saw_frame_pointer = true;

Expand All @@ -1390,11 +1396,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
break;
}
/* STX XADD: lock *(u64 *)(dst + off) += src */
case BPF_STX | BPF_XADD | BPF_DW: {
case BPF_STX | BPF_ATOMIC | BPF_DW: {
const u8 tmp = bpf2sparc[TMP_REG_1];
const u8 tmp2 = bpf2sparc[TMP_REG_2];
const u8 tmp3 = bpf2sparc[TMP_REG_3];

if (insn->imm != BPF_ADD) {
pr_err_once("unknown atomic op %02x\n", insn->imm);
return -EINVAL;
}

if (insn->dst_reg == BPF_REG_FP)
ctx->saw_frame_pointer = true;

Expand Down
Loading

0 comments on commit 7064a73

Please sign in to comment.