Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-21

We've added 85 non-merge commits during the last 12 day(s) which contain
a total of 63 files changed, 4464 insertions(+), 1484 deletions(-).

The main changes are:

1) Huge batch of verifier changes to improve BPF register bounds logic
   and range support along with a large test suite, and verifier log
   improvements, all from Andrii Nakryiko.

2) Add a new kfunc which acquires the associated cgroup of a task within
   a specific cgroup v1 hierarchy where the latter is identified by its id,
   from Yafang Shao.

3) Extend verifier to allow bpf_refcount_acquire() of a map value field
   obtained via direct load which is a use-case needed in sched_ext,
   from Dave Marchevsky.

4) Fix bpf_get_task_stack() helper to add the correct crosstask check
   for the get_perf_callchain(), from Jordan Rome.

5) Fix BPF task_iter internals where lockless usage of next_thread()
   was wrong. The rework also simplifies the code, from Oleg Nesterov.

6) Fix uninitialized tail padding via LIBBPF_OPTS_RESET, and another
   fix for certain BPF UAPI structs to fix verifier failures seen
   in bpf_dynptr usage, from Yonghong Song.

7) Add BPF selftest fixes for map_percpu_stats flakes due to per-CPU BPF
   memory allocator not being able to allocate per-CPU pointer successfully,
   from Hou Tao.

8) Add prep work around dynptr and string handling for kfuncs which
   is later going to be used by file verification via BPF LSM and fsverity,
   from Song Liu.

9) Improve BPF selftests to update multiple prog_tests to use ASSERT_*
   macros, from Yuran Pereira.

10) Optimize LPM trie lookup to check prefixlen before walking the trie,
    from Florian Lehner.

11) Consolidate virtio/9p configs from BPF selftests in config.vm file
    given they are needed consistently across archs, from Manu Bretelle.

12) Small BPF verifier refactor to remove register_is_const(),
    from Shung-Hsi Yu.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinux
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bind_perm
  selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
  selftests/bpf: reduce verboseness of reg_bounds selftest logs
  bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)
  bpf: bpf_iter_task_next: use __next_thread() rather than next_thread()
  bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()
  bpf: emit frameno for PTR_TO_STACK regs if it differs from current one
  bpf: smarter verifier log number printing logic
  bpf: omit default off=0 and imm=0 in register state log
  bpf: emit map name in register state if applicable and available
  bpf: print spilled register state in stack slot
  bpf: extract register state printing
  bpf: move verifier state printing code to kernel/bpf/log.c
  bpf: move verbose_linfo() into kernel/bpf/log.c
  bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS
  bpf: Remove test for MOVSX32 with offset=32
  selftests/bpf: add iter test requiring range x range logic
  veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag
  ...
====================

Link: https://lore.kernel.org/r/20231122000500.28126-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Jakub Kicinski committed Nov 22, 2023
2 parents 340bf2d + 3cbbf91 commit 5347528
Show file tree
Hide file tree
Showing 63 changed files with 4,464 additions and 1,484 deletions.
24 changes: 24 additions & 0 deletions Documentation/bpf/kfuncs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,30 @@ Either way, the returned buffer is either NULL, or of size buffer_szk. Without t
annotation, the verifier will reject the program if a null pointer is passed in with
a nonzero size.

2.2.5 __str Annotation
----------------------------
This annotation is used to indicate that the argument is a constant string.

An example is given below::

__bpf_kfunc bpf_get_file_xattr(..., const char *name__str, ...)
{
...
}

In this case, ``bpf_get_file_xattr()`` can be called as::

bpf_get_file_xattr(..., "xattr_name", ...);

Or::

const char name[] = "xattr_name"; /* This need to be global */
int BPF_PROG(...)
{
...
bpf_get_file_xattr(..., name, ...);
...
}

.. _BPF_kfunc_nodef:

Expand Down
6 changes: 4 additions & 2 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -186,8 +186,8 @@ enum btf_field_type {
BPF_LIST_NODE = (1 << 6),
BPF_RB_ROOT = (1 << 7),
BPF_RB_NODE = (1 << 8),
BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
BPF_RB_NODE | BPF_RB_ROOT,
BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE,
BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
BPF_REFCOUNT = (1 << 9),
};

Expand Down Expand Up @@ -1226,6 +1226,8 @@ enum bpf_dynptr_type {

int bpf_dynptr_check_size(u32 size);
u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr);
const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len);
void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len);

#ifdef CONFIG_BPF_JIT
int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
Expand Down
77 changes: 77 additions & 0 deletions include/linux/bpf_verifier.h
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,7 @@ struct bpf_verifier_env {
int stack_size; /* number of states to be processed */
bool strict_alignment; /* perform strict pointer alignment checks */
bool test_state_freq; /* test verifier with different pruning frequency */
bool test_reg_invariants; /* fail verification on register invariants violations */
struct bpf_verifier_state *cur_state; /* current verifier state */
struct bpf_verifier_state_list **explored_states; /* search pruning optimization */
struct bpf_verifier_state_list *free_list;
Expand Down Expand Up @@ -679,6 +680,10 @@ int bpf_vlog_init(struct bpf_verifier_log *log, u32 log_level,
void bpf_vlog_reset(struct bpf_verifier_log *log, u64 new_pos);
int bpf_vlog_finalize(struct bpf_verifier_log *log, u32 *log_size_actual);

__printf(3, 4) void verbose_linfo(struct bpf_verifier_env *env,
u32 insn_off,
const char *prefix_fmt, ...);

static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
{
struct bpf_verifier_state *cur = env->cur_state;
Expand Down Expand Up @@ -778,4 +783,76 @@ static inline bool bpf_type_has_unsafe_modifiers(u32 type)
return type_flag(type) & ~BPF_REG_TRUSTED_MODIFIERS;
}

static inline bool type_is_ptr_alloc_obj(u32 type)
{
return base_type(type) == PTR_TO_BTF_ID && type_flag(type) & MEM_ALLOC;
}

static inline bool type_is_non_owning_ref(u32 type)
{
return type_is_ptr_alloc_obj(type) && type_flag(type) & NON_OWN_REF;
}

static inline bool type_is_pkt_pointer(enum bpf_reg_type type)
{
type = base_type(type);
return type == PTR_TO_PACKET ||
type == PTR_TO_PACKET_META;
}

static inline bool type_is_sk_pointer(enum bpf_reg_type type)
{
return type == PTR_TO_SOCKET ||
type == PTR_TO_SOCK_COMMON ||
type == PTR_TO_TCP_SOCK ||
type == PTR_TO_XDP_SOCK;
}

static inline void mark_reg_scratched(struct bpf_verifier_env *env, u32 regno)
{
env->scratched_regs |= 1U << regno;
}

static inline void mark_stack_slot_scratched(struct bpf_verifier_env *env, u32 spi)
{
env->scratched_stack_slots |= 1ULL << spi;
}

static inline bool reg_scratched(const struct bpf_verifier_env *env, u32 regno)
{
return (env->scratched_regs >> regno) & 1;
}

static inline bool stack_slot_scratched(const struct bpf_verifier_env *env, u64 regno)
{
return (env->scratched_stack_slots >> regno) & 1;
}

static inline bool verifier_state_scratched(const struct bpf_verifier_env *env)
{
return env->scratched_regs || env->scratched_stack_slots;
}

static inline void mark_verifier_state_clean(struct bpf_verifier_env *env)
{
env->scratched_regs = 0U;
env->scratched_stack_slots = 0ULL;
}

/* Used for printing the entire verifier state. */
static inline void mark_verifier_state_scratched(struct bpf_verifier_env *env)
{
env->scratched_regs = ~0U;
env->scratched_stack_slots = ~0ULL;
}

const char *reg_type_str(struct bpf_verifier_env *env, enum bpf_reg_type type);
const char *dynptr_type_str(enum bpf_dynptr_type type);
const char *iter_type_str(const struct btf *btf, u32 btf_id);
const char *iter_state_str(enum bpf_iter_state state);

void print_verifier_state(struct bpf_verifier_env *env,
const struct bpf_func_state *state, bool print_all);
void print_insn_state(struct bpf_verifier_env *env, const struct bpf_func_state *state);

#endif /* _LINUX_BPF_VERIFIER_H */
1 change: 1 addition & 0 deletions include/linux/cgroup-defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -563,6 +563,7 @@ struct cgroup_root {

/* A list running through the active hierarchies */
struct list_head root_list;
struct rcu_head rcu;

/* Hierarchy-specific flags */
unsigned int flags;
Expand Down
4 changes: 3 additions & 1 deletion include/linux/cgroup.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ struct css_task_iter {
extern struct file_system_type cgroup_fs_type;
extern struct cgroup_root cgrp_dfl_root;
extern struct css_set init_css_set;
extern spinlock_t css_set_lock;

#define SUBSYS(_x) extern struct cgroup_subsys _x ## _cgrp_subsys;
#include <linux/cgroup_subsys.h>
Expand Down Expand Up @@ -386,7 +387,6 @@ static inline void cgroup_unlock(void)
* as locks used during the cgroup_subsys::attach() methods.
*/
#ifdef CONFIG_PROVE_RCU
extern spinlock_t css_set_lock;
#define task_css_set_check(task, __c) \
rcu_dereference_check((task)->cgroups, \
rcu_read_lock_sched_held() || \
Expand Down Expand Up @@ -853,4 +853,6 @@ static inline void cgroup_bpf_put(struct cgroup *cgrp) {}

#endif /* CONFIG_CGROUP_BPF */

struct cgroup *task_get_cgroup1(struct task_struct *tsk, int hierarchy_id);

#endif /* _LINUX_CGROUP_H */
2 changes: 1 addition & 1 deletion include/linux/compiler-gcc.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@
#endif

#define __diag_ignore_all(option, comment) \
__diag_GCC(8, ignore, option)
__diag(__diag_GCC_ignore option)

/*
* Prior to 9.1, -Wno-alloc-size-larger-than (and therefore the "alloc_size"
Expand Down
4 changes: 4 additions & 0 deletions include/linux/tnum.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,10 @@ int tnum_sbin(char *str, size_t size, struct tnum a);
struct tnum tnum_subreg(struct tnum a);
/* Returns the tnum with the lower 32-bit subreg cleared */
struct tnum tnum_clear_subreg(struct tnum a);
/* Returns the tnum with the lower 32-bit subreg in *reg* set to the lower
* 32-bit subreg in *subreg*
*/
struct tnum tnum_with_subreg(struct tnum reg, struct tnum subreg);
/* Returns the tnum with the lower 32-bit subreg set to value */
struct tnum tnum_const_subreg(struct tnum a, u32 value);
/* Returns true if 32-bit subreg @a is a known constant*/
Expand Down
29 changes: 13 additions & 16 deletions include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -1200,6 +1200,9 @@ enum bpf_perf_event_type {
*/
#define BPF_F_XDP_DEV_BOUND_ONLY (1U << 6)

/* The verifier internal test flag. Behavior is undefined */
#define BPF_F_TEST_REG_INVARIANTS (1U << 7)

/* link_create.kprobe_multi.flags used in LINK_CREATE command for
* BPF_TRACE_KPROBE_MULTI attach type to create return probe.
*/
Expand Down Expand Up @@ -4517,6 +4520,8 @@ union bpf_attr {
* long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
* Description
* Return a user or a kernel stack in bpf program provided buffer.
* Note: the user stack will only be populated if the *task* is
* the current task; all other tasks will return -EOPNOTSUPP.
* To achieve this, the helper needs *task*, which is a valid
* pointer to **struct task_struct**. To store the stacktrace, the
* bpf program provides *buf* with a nonnegative *size*.
Expand All @@ -4528,6 +4533,7 @@ union bpf_attr {
*
* **BPF_F_USER_STACK**
* Collect a user space stack instead of a kernel stack.
* The *task* must be the current task.
* **BPF_F_USER_BUILD_ID**
* Collect buildid+offset instead of ips for user stack,
* only valid if **BPF_F_USER_STACK** is also specified.
Expand Down Expand Up @@ -7151,40 +7157,31 @@ struct bpf_spin_lock {
};

struct bpf_timer {
__u64 :64;
__u64 :64;
__u64 __opaque[2];
} __attribute__((aligned(8)));

struct bpf_dynptr {
__u64 :64;
__u64 :64;
__u64 __opaque[2];
} __attribute__((aligned(8)));

struct bpf_list_head {
__u64 :64;
__u64 :64;
__u64 __opaque[2];
} __attribute__((aligned(8)));

struct bpf_list_node {
__u64 :64;
__u64 :64;
__u64 :64;
__u64 __opaque[3];
} __attribute__((aligned(8)));

struct bpf_rb_root {
__u64 :64;
__u64 :64;
__u64 __opaque[2];
} __attribute__((aligned(8)));

struct bpf_rb_node {
__u64 :64;
__u64 :64;
__u64 :64;
__u64 :64;
__u64 __opaque[4];
} __attribute__((aligned(8)));

struct bpf_refcount {
__u32 :32;
__u32 __opaque[1];
} __attribute__((aligned(4)));

struct bpf_sysctl {
Expand Down
11 changes: 4 additions & 7 deletions kernel/bpf/btf.c
Original file line number Diff line number Diff line change
Expand Up @@ -3840,9 +3840,6 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
return ERR_PTR(ret);
}

#define GRAPH_ROOT_MASK (BPF_LIST_HEAD | BPF_RB_ROOT)
#define GRAPH_NODE_MASK (BPF_LIST_NODE | BPF_RB_NODE)

int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
{
int i;
Expand All @@ -3855,13 +3852,13 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
* Hence we only need to ensure that bpf_{list_head,rb_root} ownership
* does not form cycles.
*/
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & GRAPH_ROOT_MASK))
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_GRAPH_ROOT))
return 0;
for (i = 0; i < rec->cnt; i++) {
struct btf_struct_meta *meta;
u32 btf_id;

if (!(rec->fields[i].type & GRAPH_ROOT_MASK))
if (!(rec->fields[i].type & BPF_GRAPH_ROOT))
continue;
btf_id = rec->fields[i].graph_root.value_btf_id;
meta = btf_find_struct_meta(btf, btf_id);
Expand All @@ -3873,7 +3870,7 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
* to check ownership cycle for a type unless it's also a
* node type.
*/
if (!(rec->field_mask & GRAPH_NODE_MASK))
if (!(rec->field_mask & BPF_GRAPH_NODE))
continue;

/* We need to ensure ownership acyclicity among all types. The
Expand Down Expand Up @@ -3909,7 +3906,7 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
* - A is both an root and node.
* - B is only an node.
*/
if (meta->record->field_mask & GRAPH_ROOT_MASK)
if (meta->record->field_mask & BPF_GRAPH_ROOT)
return -ELOOP;
}
return 0;
Expand Down
46 changes: 41 additions & 5 deletions kernel/bpf/helpers.c
Original file line number Diff line number Diff line change
Expand Up @@ -1937,10 +1937,7 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu)
ma = &bpf_global_percpu_ma;
else
ma = &bpf_global_ma;
if (rec && rec->refcount_off >= 0)
bpf_mem_free_rcu(ma, p);
else
bpf_mem_free(ma, p);
bpf_mem_free_rcu(ma, p);
}

__bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
Expand Down Expand Up @@ -2231,6 +2228,25 @@ __bpf_kfunc long bpf_task_under_cgroup(struct task_struct *task,
rcu_read_unlock();
return ret;
}

/**
* bpf_task_get_cgroup1 - Acquires the associated cgroup of a task within a
* specific cgroup1 hierarchy. The cgroup1 hierarchy is identified by its
* hierarchy ID.
* @task: The target task
* @hierarchy_id: The ID of a cgroup1 hierarchy
*
* On success, the cgroup is returen. On failure, NULL is returned.
*/
__bpf_kfunc struct cgroup *
bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
{
struct cgroup *cgrp = task_get_cgroup1(task, hierarchy_id);

if (IS_ERR(cgrp))
return NULL;
return cgrp;
}
#endif /* CONFIG_CGROUPS */

/**
Expand Down Expand Up @@ -2520,7 +2536,7 @@ BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_percpu_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL | KF_RCU)
BTF_ID_FLAGS(func, bpf_list_push_front_impl)
BTF_ID_FLAGS(func, bpf_list_push_back_impl)
BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
Expand All @@ -2537,6 +2553,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_release, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
Expand Down Expand Up @@ -2618,3 +2635,22 @@ static int __init kfunc_init(void)
}

late_initcall(kfunc_init);

/* Get a pointer to dynptr data up to len bytes for read only access. If
* the dynptr doesn't have continuous data up to len bytes, return NULL.
*/
const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len)
{
return bpf_dynptr_slice(ptr, 0, NULL, len);
}

/* Get a pointer to dynptr data up to len bytes for read write access. If
* the dynptr doesn't have continuous data up to len bytes, or the dynptr
* is read only, return NULL.
*/
void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len)
{
if (__bpf_dynptr_is_rdonly(ptr))
return NULL;
return (void *)__bpf_dynptr_data(ptr, len);
}
Loading

0 comments on commit 5347528

Please sign in to comment.