-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'bpf-link-support-for-tc-bpf-programs'
Daniel Borkmann says: ==================== BPF link support for tc BPF programs This series adds BPF link support for tc BPF programs. We initially presented the motivation, related work and design at last year's LPC conference in the networking & BPF track [0], and a recent update on our progress of the rework during this year's LSF/MM/BPF summit [1]. The main changes are in first two patches and the last two have an extensive batch of test cases we developed along with it, please see individual patches for details. We tested this series with tc-testing selftest suite as well as BPF CI/selftests. Thanks! v5 -> v6: - Remove export symbol on tcx_inc/dec (Jakub) - Treat fd==0 as invalid (Stan, Alexei) v4 -> v5: - Updated bpftool docs and usage of bpftool net (Quentin) - Consistent dump "prog id"/"link id" -> "prog_id"/"link_id" (Quentin) - Reworked bpftool flag output handling (Quentin) - LIBBPF_OPTS_RESET() macro with varargs for reinit (Andrii) - libbpf opts/link bail out on relative_fd && relative_id (Andrii) - libbpf improvements for assigning attr.relative_{id,fd} (Andrii) - libbpf sorting in libbpf.map (Andrii) - libbpf move ifindex to bpf_program__attach_tcx param (Andrii) - libbpf move BPF_F_ID flag handling to bpf_link_create (Andrii) - bpf_program_attach_fd with tcx instead of tc (Andrii) - Reworking kernel-internal bpf_mprog API (Alexei, Andrii) - Change "object" notation to "id_or_fd" (Andrii) - Remove on stack cpp[BPF_MPROG_MAX] and switch to memmove (Andrii) - Simplify bpf_mprog_{insert,delete} and add comment on internals - Get rid of BPF_MPROG_* return codes (Alexei, Andrii) v3 -> v4: - Fix bpftool output to display tcx/{ingress,egress} (Stan) - Documentation around API, BPF_MPROG_* return codes and locking expectations (Stan, Alexei) - Change _after and _before to have the same semantics for return value (Alexei) - Rework mprog initialization and move allocation/free one layer up into tcx to simplify the code (Stan) - Add comment on synchronize_rcu and parent->ref (Stan) - Add comment on bpf_mprog_pos_() helpers wrt target position (Stan) v2 -> v3: - Removal of BPF_F_FIRST/BPF_F_LAST from control UAPI (Toke, Stan) - Along with that full rework of bpf_mprog internals to simplify dependency management, looks much nicer now imho - Just single bpf_mprog_cp instead of two (Andrii) - atomic64_t for revision counter (Andrii) - Evaluate target position and reject on conflicts (Andrii) - Keep track of actual count in bpf_mprob_bundle (Andrii) - Make combo of REPLACE and BEFORE/AFTER work (Andrii) - Moved miniq as first struct member (Jamal) - Rework tcx_link_attach with regards to rtnl (Jakub, Andrii) - Moved wrappers after bpf_prog_detach_ops (Andrii) - Removed union for relative_fd and friends for opts and link in libbpf (Andrii) - Add doc comments to attach/detach/query libbpf APIs (Andrii) - Dropped SEC_ATTACHABLE_OPT (Andrii) - Add an OPTS_ZEROED check to bpf_link_create (Andrii) - Keep opts as the last argument in bpf_program_attach_fd (Andrii) - Rework bpf_program_attach_fd (Andrii) - Remove OPTS_GET before we checked OPTS_VALID in bpf_program__attach_tcx (Andrii) - Add `size_t :0;` to prevent compiler from leaving garbage (Andrii) - Add helper macro to clear opts structs which I found useful when writing tests - Rework of both opts and link test cases to accommodate for changes v1 -> v2: - Rework of almost entire series to remove prio from UAPI and switch to better control directives BPF_F_FIRST/BPF_F_LAST/BPF_F_BEFORE/ BPF_F_AFTER (Alexei, Toke, Stan, Andrii) - Addition of big test suite to cover all corner cases [0] https://lpc.events/event/16/contributions/1353/ [1] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf ==================== Link: https://lore.kernel.org/r/20230719140858.13224-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Loading branch information
Showing
31 changed files
with
6,069 additions
and
241 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,327 @@ | ||
/* SPDX-License-Identifier: GPL-2.0 */ | ||
/* Copyright (c) 2023 Isovalent */ | ||
#ifndef __BPF_MPROG_H | ||
#define __BPF_MPROG_H | ||
|
||
#include <linux/bpf.h> | ||
|
||
/* bpf_mprog framework: | ||
* | ||
* bpf_mprog is a generic layer for multi-program attachment. In-kernel users | ||
* of the bpf_mprog don't need to care about the dependency resolution | ||
* internals, they can just consume it with few API calls. Currently available | ||
* dependency directives are BPF_F_{BEFORE,AFTER} which enable insertion of | ||
* a BPF program or BPF link relative to an existing BPF program or BPF link | ||
* inside the multi-program array as well as prepend and append behavior if | ||
* no relative object was specified, see corresponding selftests for concrete | ||
* examples (e.g. tc_links and tc_opts test cases of test_progs). | ||
* | ||
* Usage of bpf_mprog_{attach,detach,query}() core APIs with pseudo code: | ||
* | ||
* Attach case: | ||
* | ||
* struct bpf_mprog_entry *entry, *entry_new; | ||
* int ret; | ||
* | ||
* // bpf_mprog user-side lock | ||
* // fetch active @entry from attach location | ||
* [...] | ||
* ret = bpf_mprog_attach(entry, &entry_new, [...]); | ||
* if (!ret) { | ||
* if (entry != entry_new) { | ||
* // swap @entry to @entry_new at attach location | ||
* // ensure there are no inflight users of @entry: | ||
* synchronize_rcu(); | ||
* } | ||
* bpf_mprog_commit(entry); | ||
* } else { | ||
* // error path, bail out, propagate @ret | ||
* } | ||
* // bpf_mprog user-side unlock | ||
* | ||
* Detach case: | ||
* | ||
* struct bpf_mprog_entry *entry, *entry_new; | ||
* int ret; | ||
* | ||
* // bpf_mprog user-side lock | ||
* // fetch active @entry from attach location | ||
* [...] | ||
* ret = bpf_mprog_detach(entry, &entry_new, [...]); | ||
* if (!ret) { | ||
* // all (*) marked is optional and depends on the use-case | ||
* // whether bpf_mprog_bundle should be freed or not | ||
* if (!bpf_mprog_total(entry_new)) (*) | ||
* entry_new = NULL (*) | ||
* // swap @entry to @entry_new at attach location | ||
* // ensure there are no inflight users of @entry: | ||
* synchronize_rcu(); | ||
* bpf_mprog_commit(entry); | ||
* if (!entry_new) (*) | ||
* // free bpf_mprog_bundle (*) | ||
* } else { | ||
* // error path, bail out, propagate @ret | ||
* } | ||
* // bpf_mprog user-side unlock | ||
* | ||
* Query case: | ||
* | ||
* struct bpf_mprog_entry *entry; | ||
* int ret; | ||
* | ||
* // bpf_mprog user-side lock | ||
* // fetch active @entry from attach location | ||
* [...] | ||
* ret = bpf_mprog_query(attr, uattr, entry); | ||
* // bpf_mprog user-side unlock | ||
* | ||
* Data/fast path: | ||
* | ||
* struct bpf_mprog_entry *entry; | ||
* struct bpf_mprog_fp *fp; | ||
* struct bpf_prog *prog; | ||
* int ret = [...]; | ||
* | ||
* rcu_read_lock(); | ||
* // fetch active @entry from attach location | ||
* [...] | ||
* bpf_mprog_foreach_prog(entry, fp, prog) { | ||
* ret = bpf_prog_run(prog, [...]); | ||
* // process @ret from program | ||
* } | ||
* [...] | ||
* rcu_read_unlock(); | ||
* | ||
* bpf_mprog locking considerations: | ||
* | ||
* bpf_mprog_{attach,detach,query}() must be protected by an external lock | ||
* (like RTNL in case of tcx). | ||
* | ||
* bpf_mprog_entry pointer can be an __rcu annotated pointer (in case of tcx | ||
* the netdevice has tcx_ingress and tcx_egress __rcu pointer) which gets | ||
* updated via rcu_assign_pointer() pointing to the active bpf_mprog_entry of | ||
* the bpf_mprog_bundle. | ||
* | ||
* Fast path accesses the active bpf_mprog_entry within RCU critical section | ||
* (in case of tcx it runs in NAPI which provides RCU protection there, | ||
* other users might need explicit rcu_read_lock()). The bpf_mprog_commit() | ||
* assumes that for the old bpf_mprog_entry there are no inflight users | ||
* anymore. | ||
* | ||
* The READ_ONCE()/WRITE_ONCE() pairing for bpf_mprog_fp's prog access is for | ||
* the replacement case where we don't swap the bpf_mprog_entry. | ||
*/ | ||
|
||
#define bpf_mprog_foreach_tuple(entry, fp, cp, t) \ | ||
for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\ | ||
({ \ | ||
t.prog = READ_ONCE(fp->prog); \ | ||
t.link = cp->link; \ | ||
t.prog; \ | ||
}); \ | ||
fp++, cp++) | ||
|
||
#define bpf_mprog_foreach_prog(entry, fp, p) \ | ||
for (fp = &entry->fp_items[0]; \ | ||
(p = READ_ONCE(fp->prog)); \ | ||
fp++) | ||
|
||
#define BPF_MPROG_MAX 64 | ||
|
||
struct bpf_mprog_fp { | ||
struct bpf_prog *prog; | ||
}; | ||
|
||
struct bpf_mprog_cp { | ||
struct bpf_link *link; | ||
}; | ||
|
||
struct bpf_mprog_entry { | ||
struct bpf_mprog_fp fp_items[BPF_MPROG_MAX]; | ||
struct bpf_mprog_bundle *parent; | ||
}; | ||
|
||
struct bpf_mprog_bundle { | ||
struct bpf_mprog_entry a; | ||
struct bpf_mprog_entry b; | ||
struct bpf_mprog_cp cp_items[BPF_MPROG_MAX]; | ||
struct bpf_prog *ref; | ||
atomic64_t revision; | ||
u32 count; | ||
}; | ||
|
||
struct bpf_tuple { | ||
struct bpf_prog *prog; | ||
struct bpf_link *link; | ||
}; | ||
|
||
static inline struct bpf_mprog_entry * | ||
bpf_mprog_peer(const struct bpf_mprog_entry *entry) | ||
{ | ||
if (entry == &entry->parent->a) | ||
return &entry->parent->b; | ||
else | ||
return &entry->parent->a; | ||
} | ||
|
||
static inline void bpf_mprog_bundle_init(struct bpf_mprog_bundle *bundle) | ||
{ | ||
BUILD_BUG_ON(sizeof(bundle->a.fp_items[0]) > sizeof(u64)); | ||
BUILD_BUG_ON(ARRAY_SIZE(bundle->a.fp_items) != | ||
ARRAY_SIZE(bundle->cp_items)); | ||
|
||
memset(bundle, 0, sizeof(*bundle)); | ||
atomic64_set(&bundle->revision, 1); | ||
bundle->a.parent = bundle; | ||
bundle->b.parent = bundle; | ||
} | ||
|
||
static inline void bpf_mprog_inc(struct bpf_mprog_entry *entry) | ||
{ | ||
entry->parent->count++; | ||
} | ||
|
||
static inline void bpf_mprog_dec(struct bpf_mprog_entry *entry) | ||
{ | ||
entry->parent->count--; | ||
} | ||
|
||
static inline int bpf_mprog_max(void) | ||
{ | ||
return ARRAY_SIZE(((struct bpf_mprog_entry *)NULL)->fp_items) - 1; | ||
} | ||
|
||
static inline int bpf_mprog_total(struct bpf_mprog_entry *entry) | ||
{ | ||
int total = entry->parent->count; | ||
|
||
WARN_ON_ONCE(total > bpf_mprog_max()); | ||
return total; | ||
} | ||
|
||
static inline bool bpf_mprog_exists(struct bpf_mprog_entry *entry, | ||
struct bpf_prog *prog) | ||
{ | ||
const struct bpf_mprog_fp *fp; | ||
const struct bpf_prog *tmp; | ||
|
||
bpf_mprog_foreach_prog(entry, fp, tmp) { | ||
if (tmp == prog) | ||
return true; | ||
} | ||
return false; | ||
} | ||
|
||
static inline void bpf_mprog_mark_for_release(struct bpf_mprog_entry *entry, | ||
struct bpf_tuple *tuple) | ||
{ | ||
WARN_ON_ONCE(entry->parent->ref); | ||
if (!tuple->link) | ||
entry->parent->ref = tuple->prog; | ||
} | ||
|
||
static inline void bpf_mprog_complete_release(struct bpf_mprog_entry *entry) | ||
{ | ||
/* In the non-link case prog deletions can only drop the reference | ||
* to the prog after the bpf_mprog_entry got swapped and the | ||
* bpf_mprog ensured that there are no inflight users anymore. | ||
* | ||
* Paired with bpf_mprog_mark_for_release(). | ||
*/ | ||
if (entry->parent->ref) { | ||
bpf_prog_put(entry->parent->ref); | ||
entry->parent->ref = NULL; | ||
} | ||
} | ||
|
||
static inline void bpf_mprog_revision_new(struct bpf_mprog_entry *entry) | ||
{ | ||
atomic64_inc(&entry->parent->revision); | ||
} | ||
|
||
static inline void bpf_mprog_commit(struct bpf_mprog_entry *entry) | ||
{ | ||
bpf_mprog_complete_release(entry); | ||
bpf_mprog_revision_new(entry); | ||
} | ||
|
||
static inline u64 bpf_mprog_revision(struct bpf_mprog_entry *entry) | ||
{ | ||
return atomic64_read(&entry->parent->revision); | ||
} | ||
|
||
static inline void bpf_mprog_entry_copy(struct bpf_mprog_entry *dst, | ||
struct bpf_mprog_entry *src) | ||
{ | ||
memcpy(dst->fp_items, src->fp_items, sizeof(src->fp_items)); | ||
} | ||
|
||
static inline void bpf_mprog_entry_grow(struct bpf_mprog_entry *entry, int idx) | ||
{ | ||
int total = bpf_mprog_total(entry); | ||
|
||
memmove(entry->fp_items + idx + 1, | ||
entry->fp_items + idx, | ||
(total - idx) * sizeof(struct bpf_mprog_fp)); | ||
|
||
memmove(entry->parent->cp_items + idx + 1, | ||
entry->parent->cp_items + idx, | ||
(total - idx) * sizeof(struct bpf_mprog_cp)); | ||
} | ||
|
||
static inline void bpf_mprog_entry_shrink(struct bpf_mprog_entry *entry, int idx) | ||
{ | ||
/* Total array size is needed in this case to enure the NULL | ||
* entry is copied at the end. | ||
*/ | ||
int total = ARRAY_SIZE(entry->fp_items); | ||
|
||
memmove(entry->fp_items + idx, | ||
entry->fp_items + idx + 1, | ||
(total - idx - 1) * sizeof(struct bpf_mprog_fp)); | ||
|
||
memmove(entry->parent->cp_items + idx, | ||
entry->parent->cp_items + idx + 1, | ||
(total - idx - 1) * sizeof(struct bpf_mprog_cp)); | ||
} | ||
|
||
static inline void bpf_mprog_read(struct bpf_mprog_entry *entry, u32 idx, | ||
struct bpf_mprog_fp **fp, | ||
struct bpf_mprog_cp **cp) | ||
{ | ||
*fp = &entry->fp_items[idx]; | ||
*cp = &entry->parent->cp_items[idx]; | ||
} | ||
|
||
static inline void bpf_mprog_write(struct bpf_mprog_fp *fp, | ||
struct bpf_mprog_cp *cp, | ||
struct bpf_tuple *tuple) | ||
{ | ||
WRITE_ONCE(fp->prog, tuple->prog); | ||
cp->link = tuple->link; | ||
} | ||
|
||
int bpf_mprog_attach(struct bpf_mprog_entry *entry, | ||
struct bpf_mprog_entry **entry_new, | ||
struct bpf_prog *prog_new, struct bpf_link *link, | ||
struct bpf_prog *prog_old, | ||
u32 flags, u32 id_or_fd, u64 revision); | ||
|
||
int bpf_mprog_detach(struct bpf_mprog_entry *entry, | ||
struct bpf_mprog_entry **entry_new, | ||
struct bpf_prog *prog, struct bpf_link *link, | ||
u32 flags, u32 id_or_fd, u64 revision); | ||
|
||
int bpf_mprog_query(const union bpf_attr *attr, union bpf_attr __user *uattr, | ||
struct bpf_mprog_entry *entry); | ||
|
||
static inline bool bpf_mprog_supported(enum bpf_prog_type type) | ||
{ | ||
switch (type) { | ||
case BPF_PROG_TYPE_SCHED_CLS: | ||
return true; | ||
default: | ||
return false; | ||
} | ||
} | ||
#endif /* __BPF_MPROG_H */ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.