Skip to content

Commit

Permalink
Merge branch 'Add skb + xdp dynptrs'
Browse files Browse the repository at this point in the history
Joanne Koong says:

====================

This patchset is the 2nd in the dynptr series. The 1st can be found here [0].

This patchset adds skb and xdp type dynptrs, which have two main benefits for
packet parsing:
    * allowing operations on sizes that are not statically known at
      compile-time (eg variable-sized accesses).
    * more ergonomic and less brittle iteration through data (eg does not need
      manual if checking for being within bounds of data_end)

When comparing the differences in runtime for packet parsing without dynptrs
vs. with dynptrs, there is no noticeable difference. Patch 9 contains more
details as well as examples of how to use skb and xdp dynptrs.

[0] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/
---
Changelog:

v12 = https://lore.kernel.org/bpf/20230226085120.3907863-1-joannelkoong@gmail.com/
v12 -> v13:
    * Fix missing { } for case statement

v11 = https://lore.kernel.org/bpf/20230222060747.2562549-1-joannelkoong@gmail.com/
v11 -> v12:
    * Change constant mem size checking to use "__szk" kfunc annotation
      for slices
    * Use autoloading for success selftests

v10 = https://lore.kernel.org/bpf/20230216225524.1192789-1-joannelkoong@gmail.com/
v10 -> v11:
    * Reject bpf_dynptr_slice_rdwr() for non-writable progs at load time
      instead of runtime
    * Add additional patch (__uninit kfunc annotation)
    * Expand on documentation
    * Add bpf_dynptr_write() calls for persisting writes in tests

v9 = https://lore.kernel.org/bpf/20230127191703.3864860-1-joannelkoong@gmail.com/
v9 -> v10:
    * Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr interface
    * Add some more tests
    * Split up patchset into more parts to make it easier to review

v8 = https://lore.kernel.org/bpf/20230126233439.3739120-1-joannelkoong@gmail.com/
v8 -> v9:
    * Fix dynptr_get_type() to check non-stack dynptrs

v7 = https://lore.kernel.org/bpf/20221021011510.1890852-1-joannelkoong@gmail.com/
v7 -> v8:
    * Change helpers to kfuncs
    * Add 2 new patches (1/5 and 2/5)

v6 = https://lore.kernel.org/bpf/20220907183129.745846-1-joannelkoong@gmail.com/
v6 -> v7
    * Change bpf_dynptr_data() to return read-only data slices if the skb prog
      is read-only (Martin)
    * Add test "skb_invalid_write" to test that writes to rd-only data slices
      are rejected

v5 = https://lore.kernel.org/bpf/20220831183224.3754305-1-joannelkoong@gmail.com/
v5 -> v6
    * Address kernel test robot errors by static inlining

v4 = https://lore.kernel.org/bpf/20220822235649.2218031-1-joannelkoong@gmail.com/
v4 -> v5
    * Address kernel test robot errors for configs w/out CONFIG_NET set
    * For data slices, return PTR_TO_MEM instead of PTR_TO_PACKET (Kumar)
    * Split selftests into subtests (Andrii)
    * Remove insn patching. Use rdonly and rdwr protos for dynptr skb
      construction (Andrii)
    * bpf_dynptr_data() returns NULL for rd-only dynptrs. There will be a
      separate bpf_dynptr_data_rdonly() added later (Andrii and Kumar)

v3 = https://lore.kernel.org/bpf/20220822193442.657638-1-joannelkoong@gmail.com/
v3 -> v4
    * Forgot to commit --amend the kernel test robot error fixups

v2 = https://lore.kernel.org/bpf/20220811230501.2632393-1-joannelkoong@gmail.com/
v2 -> v3
    * Fix kernel test robot build test errors

v1 = https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/
v1 -> v2
  * Return data slices to rd-only skb dynptrs (Martin)
  * bpf_dynptr_write allows writes to frags for skb dynptrs, but always
    invalidates associated data slices (Martin)
  * Use switch casing instead of ifs (Andrii)
  * Use 0xFD for experimental kind number in the selftest (Zvi)
  * Put selftest conversions w/ dynptrs into new files (Alexei)
  * Add new selftest "test_cls_redirect_dynptr.c"
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  • Loading branch information
Alexei Starovoitov committed Mar 1, 2023
2 parents ae256f9 + cfa7b01 commit c4b5c5b
Show file tree
Hide file tree
Showing 25 changed files with 3,322 additions and 186 deletions.
17 changes: 17 additions & 0 deletions Documentation/bpf/kfuncs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,23 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.

2.2.2 __uninit Annotation
--------------------

This annotation is used to indicate that the argument will be treated as
uninitialized.

An example is given below::

__bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit)
{
...
}

Here, the dynptr will be treated as an uninitialized dynptr. Without this
annotation, the verifier will reject the program if the dynptr passed in is
not initialized.

.. _BPF_kfunc_nodef:

2.3 Using an existing kernel function
Expand Down
95 changes: 64 additions & 31 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -607,11 +607,18 @@ enum bpf_type_flag {
*/
NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS),

/* DYNPTR points to sk_buff */
DYNPTR_TYPE_SKB = BIT(15 + BPF_BASE_TYPE_BITS),

/* DYNPTR points to xdp_buff */
DYNPTR_TYPE_XDP = BIT(16 + BPF_BASE_TYPE_BITS),

__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};

#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF)
#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \
| DYNPTR_TYPE_XDP)

/* Max number of base types. */
#define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS)
Expand Down Expand Up @@ -1124,6 +1131,37 @@ static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func(
return bpf_func(ctx, insnsi);
}

/* the implementation of the opaque uapi struct bpf_dynptr */
struct bpf_dynptr_kern {
void *data;
/* Size represents the number of usable bytes of dynptr data.
* If for example the offset is at 4 for a local dynptr whose data is
* of type u64, the number of usable bytes is 4.
*
* The upper 8 bits are reserved. It is as follows:
* Bits 0 - 23 = size
* Bits 24 - 30 = dynptr type
* Bit 31 = whether dynptr is read-only
*/
u32 size;
u32 offset;
} __aligned(8);

enum bpf_dynptr_type {
BPF_DYNPTR_TYPE_INVALID,
/* Points to memory that is local to the bpf program */
BPF_DYNPTR_TYPE_LOCAL,
/* Underlying data is a ringbuf record */
BPF_DYNPTR_TYPE_RINGBUF,
/* Underlying data is a sk_buff */
BPF_DYNPTR_TYPE_SKB,
/* Underlying data is a xdp_buff */
BPF_DYNPTR_TYPE_XDP,
};

int bpf_dynptr_check_size(u32 size);
u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr);

#ifdef CONFIG_BPF_JIT
int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
Expand Down Expand Up @@ -2266,6 +2304,11 @@ static inline bool has_current_bpf_ctx(void)
}

void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog);

void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
enum bpf_dynptr_type type, u32 offset, u32 size);
void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
#else /* !CONFIG_BPF_SYSCALL */
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
{
Expand Down Expand Up @@ -2495,6 +2538,19 @@ static inline void bpf_prog_inc_misses_counter(struct bpf_prog *prog)
static inline void bpf_cgrp_storage_free(struct cgroup *cgroup)
{
}

static inline void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
enum bpf_dynptr_type type, u32 offset, u32 size)
{
}

static inline void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
{
}

static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr)
{
}
#endif /* CONFIG_BPF_SYSCALL */

void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
Expand Down Expand Up @@ -2801,6 +2857,8 @@ u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size);
int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags,
struct bpf_dynptr_kern *ptr);
#else
static inline bool bpf_sock_common_is_valid_access(int off, int size,
enum bpf_access_type type,
Expand All @@ -2822,6 +2880,11 @@ static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
{
return 0;
}
static inline int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags,
struct bpf_dynptr_kern *ptr)
{
return -EOPNOTSUPP;
}
#endif

#ifdef CONFIG_INET
Expand Down Expand Up @@ -2913,36 +2976,6 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
u32 num_args, struct bpf_bprintf_data *data);
void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);

/* the implementation of the opaque uapi struct bpf_dynptr */
struct bpf_dynptr_kern {
void *data;
/* Size represents the number of usable bytes of dynptr data.
* If for example the offset is at 4 for a local dynptr whose data is
* of type u64, the number of usable bytes is 4.
*
* The upper 8 bits are reserved. It is as follows:
* Bits 0 - 23 = size
* Bits 24 - 30 = dynptr type
* Bit 31 = whether dynptr is read-only
*/
u32 size;
u32 offset;
} __aligned(8);

enum bpf_dynptr_type {
BPF_DYNPTR_TYPE_INVALID,
/* Points to memory that is local to the bpf program */
BPF_DYNPTR_TYPE_LOCAL,
/* Underlying data is a kernel-produced ringbuf record */
BPF_DYNPTR_TYPE_RINGBUF,
};

void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
enum bpf_dynptr_type type, u32 offset, u32 size);
void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
int bpf_dynptr_check_size(u32 size);
u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr);

#ifdef CONFIG_BPF_LSM
void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
void bpf_cgroup_atype_put(int cgroup_atype);
Expand Down
3 changes: 0 additions & 3 deletions include/linux/bpf_verifier.h
Original file line number Diff line number Diff line change
Expand Up @@ -616,9 +616,6 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
enum bpf_arg_type arg_type);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno, u32 mem_size);
struct bpf_call_arg_meta;
int process_dynptr_func(struct bpf_verifier_env *env, int regno,
enum bpf_arg_type arg_type, struct bpf_call_arg_meta *meta);

/* this lives here instead of in bpf.h because it needs to dereference tgt_prog */
static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog,
Expand Down
46 changes: 46 additions & 0 deletions include/linux/filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -1542,4 +1542,50 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u64 index
return XDP_REDIRECT;
}

#ifdef CONFIG_NET
int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, void *to, u32 len);
int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from,
u32 len, u64 flags);
int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len);
int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len);
void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len);
void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
void *buf, unsigned long len, bool flush);
#else /* CONFIG_NET */
static inline int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset,
void *to, u32 len)
{
return -EOPNOTSUPP;
}

static inline int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset,
const void *from, u32 len, u64 flags)
{
return -EOPNOTSUPP;
}

static inline int __bpf_xdp_load_bytes(struct xdp_buff *xdp, u32 offset,
void *buf, u32 len)
{
return -EOPNOTSUPP;
}

static inline int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset,
void *buf, u32 len)
{
return -EOPNOTSUPP;
}

static inline void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len)
{
return NULL;
}

static inline void *bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *buf,
unsigned long len, bool flush)
{
return NULL;
}
#endif /* CONFIG_NET */

#endif /* __LINUX_FILTER_H__ */
18 changes: 16 additions & 2 deletions include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -5325,18 +5325,32 @@ union bpf_attr {
* Description
* Write *len* bytes from *src* into *dst*, starting from *offset*
* into *dst*.
* *flags* is currently unused.
*
* *flags* must be 0 except for skb-type dynptrs.
*
* For skb-type dynptrs:
* * All data slices of the dynptr are automatically
* invalidated after **bpf_dynptr_write**\ (). This is
* because writing may pull the skb and change the
* underlying packet buffer.
*
* * For *flags*, please see the flags accepted by
* **bpf_skb_store_bytes**\ ().
* Return
* 0 on success, -E2BIG if *offset* + *len* exceeds the length
* of *dst*'s data, -EINVAL if *dst* is an invalid dynptr or if *dst*
* is a read-only dynptr or if *flags* is not 0.
* is a read-only dynptr or if *flags* is not correct. For skb-type dynptrs,
* other errors correspond to errors returned by **bpf_skb_store_bytes**\ ().
*
* void *bpf_dynptr_data(const struct bpf_dynptr *ptr, u32 offset, u32 len)
* Description
* Get a pointer to the underlying dynptr data.
*
* *len* must be a statically known value. The returned data slice
* is invalidated whenever the dynptr is invalidated.
*
* skb and xdp type dynptrs may not use bpf_dynptr_data. They should
* instead use bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
* Return
* Pointer to the underlying dynptr data, NULL if the dynptr is
* read-only, if the dynptr is invalid, or if the offset and length
Expand Down
22 changes: 22 additions & 0 deletions kernel/bpf/btf.c
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,11 @@ enum btf_kfunc_hook {
BTF_KFUNC_HOOK_TRACING,
BTF_KFUNC_HOOK_SYSCALL,
BTF_KFUNC_HOOK_FMODRET,
BTF_KFUNC_HOOK_CGROUP_SKB,
BTF_KFUNC_HOOK_SCHED_ACT,
BTF_KFUNC_HOOK_SK_SKB,
BTF_KFUNC_HOOK_SOCKET_FILTER,
BTF_KFUNC_HOOK_LWT,
BTF_KFUNC_HOOK_MAX,
};

Expand Down Expand Up @@ -5683,6 +5688,10 @@ btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
* int socket_filter_bpf_prog(struct __sk_buff *skb)
* { // no fields of skb are ever used }
*/
if (strcmp(ctx_tname, "__sk_buff") == 0 && strcmp(tname, "sk_buff") == 0)
return ctx_type;
if (strcmp(ctx_tname, "xdp_md") == 0 && strcmp(tname, "xdp_buff") == 0)
return ctx_type;
if (strcmp(ctx_tname, tname)) {
/* bpf_user_pt_regs_t is a typedef, so resolve it to
* underlying struct and check name again
Expand Down Expand Up @@ -7704,6 +7713,19 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type)
return BTF_KFUNC_HOOK_TRACING;
case BPF_PROG_TYPE_SYSCALL:
return BTF_KFUNC_HOOK_SYSCALL;
case BPF_PROG_TYPE_CGROUP_SKB:
return BTF_KFUNC_HOOK_CGROUP_SKB;
case BPF_PROG_TYPE_SCHED_ACT:
return BTF_KFUNC_HOOK_SCHED_ACT;
case BPF_PROG_TYPE_SK_SKB:
return BTF_KFUNC_HOOK_SK_SKB;
case BPF_PROG_TYPE_SOCKET_FILTER:
return BTF_KFUNC_HOOK_SOCKET_FILTER;
case BPF_PROG_TYPE_LWT_OUT:
case BPF_PROG_TYPE_LWT_IN:
case BPF_PROG_TYPE_LWT_XMIT:
case BPF_PROG_TYPE_LWT_SEG6LOCAL:
return BTF_KFUNC_HOOK_LWT;
default:
return BTF_KFUNC_HOOK_MAX;
}
Expand Down
Loading

0 comments on commit c4b5c5b

Please sign in to comment.