Skip to content

Commit

Permalink
Merge branch 'bpf-perf-link'
Browse files Browse the repository at this point in the history
Andrii Nakryiko says:

====================
This patch set implements an ability for users to specify custom black box u64
value for each BPF program attachment, bpf_cookie, which is available to BPF
program at runtime. This is a feature that's critically missing for cases when
some sort of generic processing needs to be done by the common BPF program
logic (or even exactly the same BPF program) across multiple BPF hooks (e.g.,
many uniformly handled kprobes) and it's important to be able to distinguish
between each BPF hook at runtime (e.g., for additional configuration lookup).

The choice of restricting this to a fixed-size 8-byte u64 value is an explicit
design decision. Making this configurable by users adds unnecessary complexity
(extra memory allocations, extra complications on the verifier side to validate
accesses to variable-sized data area) while not really opening up new
possibilities. If user's use case requires storing more data per attachment,
it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where
bpf_cookie would be used as an index into respective storage, populated by
user-space code before creating BPF link. This gives user all the flexibility
and control while keeping BPF verifier and BPF helper API simple.

Currently, similar functionality can only be achieved through:

  - code-generation and BPF program cloning, which is very complicated and
    unmaintainable;
  - on-the-fly C code generation and further runtime compilation, which is
    what BCC uses and allows to do pretty simply. The big downside is a very
    heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
    many BPF program clones and the compilation process itself);
  - in some cases (kprobes and sometimes uprobes) it's possible to do function
    IP lookup to get function-specific configuration. This doesn't work for
    all the cases (e.g., when attaching uprobes to shared libraries) and has
    higher runtime overhead and additional programming complexity due to
    BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
    BPF helper was added, it was also very complicated and unstable (API-wise)
    to get traced function's IP from fentry/fexit and kretprobe.

With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
to build generic tracing tooling simply and efficiently, ability to provide
additional bpf_cookie value for each *attachment* (as opposed to each BPF
program) is extremely important. Two immediate users of this functionality are
going to be libbpf-based USDT library (currently in development) and retsnoop
([0]), but I'm sure more applications will come once users get this feature in
their kernels.

To achieve above described, all perf_event-based BPF hooks are made available
through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
LINK_CREATE command for program attachments and generally brings
perf_event-based attachments into a common BPF link infrastructure.

With that, LINK_CREATE gets ability to pass throught bpf_cookie value during
link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF
helper is added to allow fetching this value at runtime from BPF program side.
BPF cookie is stored either on struct perf_event itself and fetched from the
BPF program context, or is passed through ambient BPF run context, added in
c7603cf ("bpf: Add ambient BPF runtime context stored in current").

On the libbpf side of things, BPF perf link is utilized whenever is supported
by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
All the tracing attach APIs are extended with OPTS and bpf_cookie is passed
through corresponding opts structs.

Last part of the patch set adds few self-tests utilizing new APIs.

There are also a few refactorings along the way to make things cleaner and
easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
throughout libbpf and selftests.

Follow-up patches will extend bpf_cookie to fentry/fexit programs.

While adding uprobe_opts, also extend it with ref_ctr_offset for specifying
USDT semaphore (reference counter) offset. Update attach_probe selftests to
validate its functionality. This is another feature (along with bpf_cookie)
required for implementing libbpf-based USDT solution.

  [0] https://github.com/anakryiko/retsnoop

v4->v5:
  - rebase on latest bpf-next to resolve merge conflict;
  - add ref_ctr_offset to uprobe_opts and corresponding selftest;
v3->v4:
  - get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel);
  - move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel);
v2->v3:
  - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter);
  - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri);
  - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong);
v1->v2:
  - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  • Loading branch information
Daniel Borkmann committed Aug 16, 2021
2 parents 1bda52f + 4bd11e0 commit 3a4ce01
Showing 40 changed files with 1,315 additions and 357 deletions.
4 changes: 2 additions & 2 deletions Documentation/networking/filter.rst
Original file line number Diff line number Diff line change
@@ -638,8 +638,8 @@ extension, PTP dissector/classifier, and much more. They are all internally
converted by the kernel into the new instruction set representation and run
in the eBPF interpreter. For in-kernel handlers, this all works transparently
by using bpf_prog_create() for setting up the filter, resp.
bpf_prog_destroy() for destroying it. The macro
BPF_PROG_RUN(filter, ctx) transparently invokes eBPF interpreter or JITed
bpf_prog_destroy() for destroying it. The function
bpf_prog_run(filter, ctx) transparently invokes eBPF interpreter or JITed
code to run the filter. 'filter' is a pointer to struct bpf_prog that we
got from bpf_prog_create(), and 'ctx' the given context (e.g.
skb pointer). All constraints and restrictions from bpf_check_classic() apply
6 changes: 3 additions & 3 deletions drivers/media/rc/bpf-lirc.c
Original file line number Diff line number Diff line change
@@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
goto unlock;
}

ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
if (ret < 0)
goto unlock;

@@ -193,7 +193,7 @@ static int lirc_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog)
}

old_array = lirc_rcu_dereference(raw->progs);
ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array);
ret = bpf_prog_array_copy(old_array, prog, NULL, 0, &new_array);
/*
* Do not use bpf_prog_array_delete_safe() as we would end up
* with a dummy entry in the array, and the we would free the
@@ -217,7 +217,7 @@ void lirc_bpf_run(struct rc_dev *rcdev, u32 sample)
raw->bpf_sample = sample;

if (raw->progs)
BPF_PROG_RUN_ARRAY(raw->progs, &raw->bpf_sample, BPF_PROG_RUN);
BPF_PROG_RUN_ARRAY(raw->progs, &raw->bpf_sample, bpf_prog_run);
}

/*
8 changes: 4 additions & 4 deletions drivers/net/ppp/ppp_generic.c
Original file line number Diff line number Diff line change
@@ -1744,7 +1744,7 @@ ppp_send_frame(struct ppp *ppp, struct sk_buff *skb)
a four-byte PPP header on each packet */
*(u8 *)skb_push(skb, 2) = 1;
if (ppp->pass_filter &&
BPF_PROG_RUN(ppp->pass_filter, skb) == 0) {
bpf_prog_run(ppp->pass_filter, skb) == 0) {
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
"PPP: outbound frame "
@@ -1754,7 +1754,7 @@ ppp_send_frame(struct ppp *ppp, struct sk_buff *skb)
}
/* if this packet passes the active filter, record the time */
if (!(ppp->active_filter &&
BPF_PROG_RUN(ppp->active_filter, skb) == 0))
bpf_prog_run(ppp->active_filter, skb) == 0))
ppp->last_xmit = jiffies;
skb_pull(skb, 2);
#else
@@ -2468,7 +2468,7 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb)

*(u8 *)skb_push(skb, 2) = 0;
if (ppp->pass_filter &&
BPF_PROG_RUN(ppp->pass_filter, skb) == 0) {
bpf_prog_run(ppp->pass_filter, skb) == 0) {
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
"PPP: inbound frame "
@@ -2477,7 +2477,7 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb)
return;
}
if (!(ppp->active_filter &&
BPF_PROG_RUN(ppp->active_filter, skb) == 0))
bpf_prog_run(ppp->active_filter, skb) == 0))
ppp->last_recv = jiffies;
__skb_pull(skb, 2);
} else
2 changes: 1 addition & 1 deletion drivers/net/team/team_mode_loadbalance.c
Original file line number Diff line number Diff line change
@@ -197,7 +197,7 @@ static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
fp = rcu_dereference_bh(lb_priv->fp);
if (unlikely(!fp))
return 0;
lhash = BPF_PROG_RUN(fp, skb);
lhash = bpf_prog_run(fp, skb);
c = (char *) &lhash;
return c[0] ^ c[1] ^ c[2] ^ c[3];
}
200 changes: 120 additions & 80 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
@@ -1103,7 +1103,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
/* an array of programs to be executed under rcu_lock.
*
* Typical usage:
* ret = BPF_PROG_RUN_ARRAY(&bpf_prog_array, ctx, BPF_PROG_RUN);
* ret = BPF_PROG_RUN_ARRAY(&bpf_prog_array, ctx, bpf_prog_run);
*
* the structure returned by bpf_prog_array_alloc() should be populated
* with program pointers and the last pointer must be NULL.
@@ -1114,7 +1114,10 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
*/
struct bpf_prog_array_item {
struct bpf_prog *prog;
struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
union {
struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
u64 bpf_cookie;
};
};

struct bpf_prog_array {
@@ -1140,73 +1143,133 @@ int bpf_prog_array_copy_info(struct bpf_prog_array *array,
int bpf_prog_array_copy(struct bpf_prog_array *old_array,
struct bpf_prog *exclude_prog,
struct bpf_prog *include_prog,
u64 bpf_cookie,
struct bpf_prog_array **new_array);

struct bpf_run_ctx {};

struct bpf_cg_run_ctx {
struct bpf_run_ctx run_ctx;
struct bpf_prog_array_item *prog_item;
const struct bpf_prog_array_item *prog_item;
};

struct bpf_trace_run_ctx {
struct bpf_run_ctx run_ctx;
u64 bpf_cookie;
};

static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
{
struct bpf_run_ctx *old_ctx = NULL;

#ifdef CONFIG_BPF_SYSCALL
old_ctx = current->bpf_ctx;
current->bpf_ctx = new_ctx;
#endif
return old_ctx;
}

static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
{
#ifdef CONFIG_BPF_SYSCALL
current->bpf_ctx = old_ctx;
#endif
}

/* BPF program asks to bypass CAP_NET_BIND_SERVICE in bind. */
#define BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE (1 << 0)
/* BPF program asks to set CN on the packet. */
#define BPF_RET_SET_CN (1 << 0)

#define BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, ret_flags) \
({ \
struct bpf_prog_array_item *_item; \
struct bpf_prog *_prog; \
struct bpf_prog_array *_array; \
struct bpf_run_ctx *old_run_ctx; \
struct bpf_cg_run_ctx run_ctx; \
u32 _ret = 1; \
u32 func_ret; \
migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
_item = &_array->items[0]; \
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); \
while ((_prog = READ_ONCE(_item->prog))) { \
run_ctx.prog_item = _item; \
func_ret = func(_prog, ctx); \
_ret &= (func_ret & 1); \
*(ret_flags) |= (func_ret >> 1); \
_item++; \
} \
bpf_reset_run_ctx(old_run_ctx); \
rcu_read_unlock(); \
migrate_enable(); \
_ret; \
})

#define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null, set_cg_storage) \
({ \
struct bpf_prog_array_item *_item; \
struct bpf_prog *_prog; \
struct bpf_prog_array *_array; \
struct bpf_run_ctx *old_run_ctx; \
struct bpf_cg_run_ctx run_ctx; \
u32 _ret = 1; \
migrate_disable(); \
rcu_read_lock(); \
_array = rcu_dereference(array); \
if (unlikely(check_non_null && !_array))\
goto _out; \
_item = &_array->items[0]; \
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);\
while ((_prog = READ_ONCE(_item->prog))) { \
run_ctx.prog_item = _item; \
_ret &= func(_prog, ctx); \
_item++; \
} \
bpf_reset_run_ctx(old_run_ctx); \
_out: \
rcu_read_unlock(); \
migrate_enable(); \
_ret; \
})
typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx);

static __always_inline u32
BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu,
const void *ctx, bpf_prog_run_fn run_prog,
u32 *ret_flags)
{
const struct bpf_prog_array_item *item;
const struct bpf_prog *prog;
const struct bpf_prog_array *array;
struct bpf_run_ctx *old_run_ctx;
struct bpf_cg_run_ctx run_ctx;
u32 ret = 1;
u32 func_ret;

migrate_disable();
rcu_read_lock();
array = rcu_dereference(array_rcu);
item = &array->items[0];
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
while ((prog = READ_ONCE(item->prog))) {
run_ctx.prog_item = item;
func_ret = run_prog(prog, ctx);
ret &= (func_ret & 1);
*(ret_flags) |= (func_ret >> 1);
item++;
}
bpf_reset_run_ctx(old_run_ctx);
rcu_read_unlock();
migrate_enable();
return ret;
}

static __always_inline u32
BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu,
const void *ctx, bpf_prog_run_fn run_prog)
{
const struct bpf_prog_array_item *item;
const struct bpf_prog *prog;
const struct bpf_prog_array *array;
struct bpf_run_ctx *old_run_ctx;
struct bpf_cg_run_ctx run_ctx;
u32 ret = 1;

migrate_disable();
rcu_read_lock();
array = rcu_dereference(array_rcu);
item = &array->items[0];
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
while ((prog = READ_ONCE(item->prog))) {
run_ctx.prog_item = item;
ret &= run_prog(prog, ctx);
item++;
}
bpf_reset_run_ctx(old_run_ctx);
rcu_read_unlock();
migrate_enable();
return ret;
}

static __always_inline u32
BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu,
const void *ctx, bpf_prog_run_fn run_prog)
{
const struct bpf_prog_array_item *item;
const struct bpf_prog *prog;
const struct bpf_prog_array *array;
struct bpf_run_ctx *old_run_ctx;
struct bpf_trace_run_ctx run_ctx;
u32 ret = 1;

migrate_disable();
rcu_read_lock();
array = rcu_dereference(array_rcu);
if (unlikely(!array))
goto out;
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
item = &array->items[0];
while ((prog = READ_ONCE(item->prog))) {
run_ctx.bpf_cookie = item->bpf_cookie;
ret &= run_prog(prog, ctx);
item++;
}
bpf_reset_run_ctx(old_run_ctx);
out:
rcu_read_unlock();
migrate_enable();
return ret;
}

/* To be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs
* so BPF programs can request cwr for TCP packets.
@@ -1235,7 +1298,7 @@ _out: \
u32 _flags = 0; \
bool _cn; \
u32 _ret; \
_ret = BPF_PROG_RUN_ARRAY_FLAGS(array, ctx, func, &_flags); \
_ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, &_flags); \
_cn = _flags & BPF_RET_SET_CN; \
if (_ret) \
_ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \
@@ -1244,12 +1307,6 @@ _out: \
_ret; \
})

#define BPF_PROG_RUN_ARRAY(array, ctx, func) \
__BPF_PROG_RUN_ARRAY(array, ctx, func, false, true)

#define BPF_PROG_RUN_ARRAY_CHECK(array, ctx, func) \
__BPF_PROG_RUN_ARRAY(array, ctx, func, true, false)

#ifdef CONFIG_BPF_SYSCALL
DECLARE_PER_CPU(int, bpf_prog_active);
extern struct mutex bpf_stats_enabled_mutex;
@@ -1284,20 +1341,6 @@ static inline void bpf_enable_instrumentation(void)
migrate_enable();
}

static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
{
struct bpf_run_ctx *old_ctx;

old_ctx = current->bpf_ctx;
current->bpf_ctx = new_ctx;
return old_ctx;
}

static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx)
{
current->bpf_ctx = old_ctx;
}

extern const struct file_operations bpf_map_fops;
extern const struct file_operations bpf_prog_fops;
extern const struct file_operations bpf_iter_fops;
@@ -2059,9 +2102,6 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
extern const struct bpf_func_proto bpf_sk_getsockopt_proto;

const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);

const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);

3 changes: 3 additions & 0 deletions include/linux/bpf_types.h
Original file line number Diff line number Diff line change
@@ -136,3 +136,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
#endif
#ifdef CONFIG_PERF_EVENTS
BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
#endif
Loading

0 comments on commit 3a4ce01

Please sign in to comment.