Skip to content

Commit

Permalink
Merge branch 'exceptions-1-2'
Browse files Browse the repository at this point in the history
Kumar Kartikeya Dwivedi says:

====================
Exceptions - 1/2

This series implements the _first_ part of the runtime and verifier
support needed to enable BPF exceptions. Exceptions thrown from programs
are processed as an immediate exit from the program, which unwinds all
the active stack frames until the main stack frame, and returns to the
BPF program's caller. The ability to perform this unwinding safely
allows the program to test conditions that are always true at runtime
but which the verifier has no visibility into.

Thus, it also reduces verification effort by safely terminating
redundant paths that can be taken within a program.

The patches to perform runtime resource cleanup during the
frame-by-frame unwinding will be posted as a follow-up to this set.

It must be noted that exceptions are not an error handling mechanism for
unlikely runtime conditions, but a way to safely terminate the execution
of a program in presence of conditions that should never occur at
runtime. They are meant to serve higher-level primitives such as program
assertions.

The following kfuncs and macros are introduced:

Assertion macros are also introduced, please see patch 13 for their
documentation.

/* Description
 *	Throw a BPF exception from the program, immediately terminating its
 *	execution and unwinding the stack. The supplied 'cookie' parameter
 *	will be the return value of the program when an exception is thrown,
 *	and the default exception callback is used. Otherwise, if an exception
 *	callback is set using the '__exception_cb(callback)' declaration tag
 *	on the main program, the 'cookie' parameter will be the callback's only
 *	input argument.
 *
 *	Thus, in case of default exception callback, 'cookie' is subjected to
 *	constraints on the program's return value (as with R0 on exit).
 *	Otherwise, the return value of the marked exception callback will be
 *	subjected to the same checks.
 *
 *	Note that throwing an exception with lingering resources (locks,
 *	references, etc.) will lead to a verification error.
 *
 *	Note that callbacks *cannot* call this helper.
 * Returns
 *	Never.
 * Throws
 *	An exception with the specified 'cookie' value.
 */
extern void bpf_throw(u64 cookie) __ksym;

/* This macro must be used to mark the exception callback corresponding to the
 * main program. For example:
 *
 * int exception_cb(u64 cookie) {
 *	return cookie;
 * }
 *
 * SEC("tc")
 * __exception_cb(exception_cb)
 * int main_prog(struct __sk_buff *ctx) {
 *	...
 *	return TC_ACT_OK;
 * }
 *
 * Here, exception callback for the main program will be 'exception_cb'. Note
 * that this attribute can only be used once, and multiple exception callbacks
 * specified for the main program will lead to verification error.
 */
\#define __exception_cb(name) __attribute__((btf_decl_tag("exception_callback:" #name)))

As such, a program can only install an exception handler once for the
lifetime of a BPF program, and this handler cannot be changed at
runtime. The purpose of the handler is to simply interpret the cookie
value supplied by the bpf_throw call, and execute user-defined logic
corresponding to it. The primary purpose of allowing a handler is to
control the return value of the program. The default handler returns the
cookie value passed to bpf_throw when an exception is thrown.

Fixing the handler for the lifetime of the program eliminates tricky and
expensive handling in case of runtime changes of the handler callback
when programs begin to nest, where it becomes more complex to save and
restore the active handler at runtime.

This version of offline unwinding based BPF exceptions is truly zero
overhead, with the exception of generation of a default callback which
contains a few instructions to return a default return value (0) when no
exception callback is supplied by the user.

Callbacks are disallowed from throwing BPF exceptions for now, since
such exceptions need to cross the callback helper boundary (and
therefore must care about unwinding kernel state), however it is
possible to lift this restriction in the future follow-up.

Exceptions terminate propogating at program boundaries, hence both
BPF_PROG_TYPE_EXT and tail call targets return to their caller context
the return value of the exception callback, in the event that they throw
an exception. Thus, exceptions do not cross extension or tail call
boundary.

However, this is mostly an implementation choice, and can be changed to
suit more user-friendly semantics.

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20230809114116.3216687-1-memxor@gmail.com

 * Add Dave's Acked-by.
 * Address all comments from Alexei.
   * Use bpf_is_subprog to check for main prog in bpf_stack_walker.
   * Drop accidental leftover hunk in libbpf patch.
   * Split libbpf patch's refactoring to aid review
   * Disable fentry/fexit in addition to freplace for exception cb.
   * Add selftests for fentry/fexit/freplace on exception cb and main prog.
 * Use btf_find_by_name_kind in bpf_find_exception_callback_insn_off (Martin)
 * Split KASAN patch into two to aid backporting (Andrey)
 * Move exception callback append step to bpf_object__reloacte (Andrii)
 * Ensure that the exception callback name is unique (Andrii)
 * Keep ASM implementation of assertion macros instead of C, as it does
   not achieve intended results for bpf_assert_range and other cases.

v1 -> v2
v1: https://lore.kernel.org/bpf/20230713023232.1411523-1-memxor@gmail.com

 * Address all comments from Alexei.
 * Fix a few bugs and corner cases in the implementations found during
   testing. Also add new selftests for these cases.
 * Reinstate patch to consider ksym.end part of the program (but
   reworked to cover other corner cases).
 * Implement new style of tagging exception callbacks, add libbpf
   support for the new declaration tag.
 * Limit support to 64-bit integer types for assertion macros. The
   compiler ends up performing shifts or bitwise and operations when
   finally making use of the value, which defeats the purpose of the
   macro. On noalu32 mode, the shifts may also happen before use,
   hurting reliability.
 * Comprehensively test assertion macros and their side effects on the
   verifier state, register bounds, etc.
 * Fix a KASAN false positive warning.

RFC v1 -> v1
RFC v1: https://lore.kernel.org/bpf/20230405004239.1375399-1-memxor@gmail.com

 * Completely rework the unwinding infrastructure to use offline
   unwinding support.
 * Remove the runtime exception state and program rewriting code.
 * Make bpf_set_exception_callback idempotent to avoid vexing
   synchronization and state clobbering issues in presence of program
   nesting.
 * Disable bpf_throw within callback functions, for now.
 * Allow bpf_throw in tail call programs and extension programs,
   removing limitations of rewrite based unwinding.
 * Expand selftests.
====================

Link: https://lore.kernel.org/r/20230912233214.1518551-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  • Loading branch information
Alexei Starovoitov committed Sep 16, 2023
2 parents c4ab64e + d2a9371 commit ec6f1b4
Show file tree
Hide file tree
Showing 22 changed files with 2,371 additions and 124 deletions.
2 changes: 1 addition & 1 deletion arch/arm64/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ static bool is_lsi_offset(int offset, int scale)
static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
{
const struct bpf_prog *prog = ctx->prog;
const bool is_main_prog = prog->aux->func_idx == 0;
const bool is_main_prog = !bpf_is_subprog(prog);
const u8 r6 = bpf2a64[BPF_REG_6];
const u8 r7 = bpf2a64[BPF_REG_7];
const u8 r8 = bpf2a64[BPF_REG_8];
Expand Down
2 changes: 1 addition & 1 deletion arch/s390/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@ static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp,
EMIT6_PCREL_RILC(0xc0040000, 0, jit->prologue_plt);
jit->prologue_plt_ret = jit->prg;

if (fp->aux->func_idx == 0) {
if (!bpf_is_subprog(fp)) {
/* Initialize the tail call counter in the main program. */
/* xc STK_OFF_TCCNT(4,%r15),STK_OFF_TCCNT(%r15) */
_EMIT6(0xd703f000 | STK_OFF_TCCNT, 0xf000 | STK_OFF_TCCNT);
Expand Down
117 changes: 105 additions & 12 deletions arch/x86/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
#include <asm/set_memory.h>
#include <asm/nospec-branch.h>
#include <asm/text-patching.h>
#include <asm/unwind.h>

static bool all_callee_regs_used[4] = {true, true, true, true};

static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
{
Expand Down Expand Up @@ -255,6 +258,14 @@ struct jit_context {
/* Number of bytes that will be skipped on tailcall */
#define X86_TAIL_CALL_OFFSET (11 + ENDBR_INSN_SIZE)

static void push_r12(u8 **pprog)
{
u8 *prog = *pprog;

EMIT2(0x41, 0x54); /* push r12 */
*pprog = prog;
}

static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
{
u8 *prog = *pprog;
Expand All @@ -270,6 +281,14 @@ static void push_callee_regs(u8 **pprog, bool *callee_regs_used)
*pprog = prog;
}

static void pop_r12(u8 **pprog)
{
u8 *prog = *pprog;

EMIT2(0x41, 0x5C); /* pop r12 */
*pprog = prog;
}

static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
{
u8 *prog = *pprog;
Expand All @@ -291,7 +310,8 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
* while jumping to another program
*/
static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
bool tail_call_reachable, bool is_subprog)
bool tail_call_reachable, bool is_subprog,
bool is_exception_cb)
{
u8 *prog = *pprog;

Expand All @@ -311,8 +331,22 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
/* Keep the same instruction layout. */
EMIT2(0x66, 0x90); /* nop2 */
}
EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
/* Exception callback receives FP as third parameter */
if (is_exception_cb) {
EMIT3(0x48, 0x89, 0xF4); /* mov rsp, rsi */
EMIT3(0x48, 0x89, 0xD5); /* mov rbp, rdx */
/* The main frame must have exception_boundary as true, so we
* first restore those callee-saved regs from stack, before
* reusing the stack frame.
*/
pop_callee_regs(&prog, all_callee_regs_used);
pop_r12(&prog);
/* Reset the stack frame. */
EMIT3(0x48, 0x89, 0xEC); /* mov rsp, rbp */
} else {
EMIT1(0x55); /* push rbp */
EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
}

/* X86_TAIL_CALL_OFFSET is here */
EMIT_ENDBR();
Expand Down Expand Up @@ -471,7 +505,8 @@ static void emit_return(u8 **pprog, u8 *ip)
* goto *(prog->bpf_func + prologue_size);
* out:
*/
static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
u8 **pprog, bool *callee_regs_used,
u32 stack_depth, u8 *ip,
struct jit_context *ctx)
{
Expand Down Expand Up @@ -521,7 +556,12 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
offset = ctx->tail_call_indirect_label - (prog + 2 - start);
EMIT2(X86_JE, offset); /* je out */

pop_callee_regs(&prog, callee_regs_used);
if (bpf_prog->aux->exception_boundary) {
pop_callee_regs(&prog, all_callee_regs_used);
pop_r12(&prog);
} else {
pop_callee_regs(&prog, callee_regs_used);
}

EMIT1(0x58); /* pop rax */
if (stack_depth)
Expand All @@ -545,7 +585,8 @@ static void emit_bpf_tail_call_indirect(u8 **pprog, bool *callee_regs_used,
*pprog = prog;
}

static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
struct bpf_jit_poke_descriptor *poke,
u8 **pprog, u8 *ip,
bool *callee_regs_used, u32 stack_depth,
struct jit_context *ctx)
Expand Down Expand Up @@ -574,7 +615,13 @@ static void emit_bpf_tail_call_direct(struct bpf_jit_poke_descriptor *poke,
emit_jump(&prog, (u8 *)poke->tailcall_target + X86_PATCH_SIZE,
poke->tailcall_bypass);

pop_callee_regs(&prog, callee_regs_used);
if (bpf_prog->aux->exception_boundary) {
pop_callee_regs(&prog, all_callee_regs_used);
pop_r12(&prog);
} else {
pop_callee_regs(&prog, callee_regs_used);
}

EMIT1(0x58); /* pop rax */
if (stack_depth)
EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
Expand Down Expand Up @@ -1049,8 +1096,20 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image

emit_prologue(&prog, bpf_prog->aux->stack_depth,
bpf_prog_was_classic(bpf_prog), tail_call_reachable,
bpf_prog->aux->func_idx != 0);
push_callee_regs(&prog, callee_regs_used);
bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
/* Exception callback will clobber callee regs for its own use, and
* restore the original callee regs from main prog's stack frame.
*/
if (bpf_prog->aux->exception_boundary) {
/* We also need to save r12, which is not mapped to any BPF
* register, as we throw after entry into the kernel, which may
* overwrite r12.
*/
push_r12(&prog);
push_callee_regs(&prog, all_callee_regs_used);
} else {
push_callee_regs(&prog, callee_regs_used);
}

ilen = prog - temp;
if (rw_image)
Expand Down Expand Up @@ -1647,13 +1706,15 @@ st: if (is_imm8(insn->off))

case BPF_JMP | BPF_TAIL_CALL:
if (imm32)
emit_bpf_tail_call_direct(&bpf_prog->aux->poke_tab[imm32 - 1],
emit_bpf_tail_call_direct(bpf_prog,
&bpf_prog->aux->poke_tab[imm32 - 1],
&prog, image + addrs[i - 1],
callee_regs_used,
bpf_prog->aux->stack_depth,
ctx);
else
emit_bpf_tail_call_indirect(&prog,
emit_bpf_tail_call_indirect(bpf_prog,
&prog,
callee_regs_used,
bpf_prog->aux->stack_depth,
image + addrs[i - 1],
Expand Down Expand Up @@ -1906,7 +1967,12 @@ st: if (is_imm8(insn->off))
seen_exit = true;
/* Update cleanup_addr */
ctx->cleanup_addr = proglen;
pop_callee_regs(&prog, callee_regs_used);
if (bpf_prog->aux->exception_boundary) {
pop_callee_regs(&prog, all_callee_regs_used);
pop_r12(&prog);
} else {
pop_callee_regs(&prog, callee_regs_used);
}
EMIT1(0xC9); /* leave */
emit_return(&prog, image + addrs[i - 1] + (prog - temp));
break;
Expand Down Expand Up @@ -2933,3 +2999,30 @@ void bpf_jit_free(struct bpf_prog *prog)

bpf_prog_unlock_free(prog);
}

bool bpf_jit_supports_exceptions(void)
{
/* We unwind through both kernel frames (starting from within bpf_throw
* call) and BPF frames. Therefore we require one of ORC or FP unwinder
* to be enabled to walk kernel frames and reach BPF frames in the stack
* trace.
*/
return IS_ENABLED(CONFIG_UNWINDER_ORC) || IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER);
}

void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
{
#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
struct unwind_state state;
unsigned long addr;

for (unwind_start(&state, current, NULL, NULL); !unwind_done(&state);
unwind_next_frame(&state)) {
addr = unwind_get_return_address(&state);
if (!addr || !consume_fn(cookie, (u64)addr, (u64)state.sp, (u64)state.bp))
break;
}
return;
#endif
WARN(1, "verification of programs using bpf_throw should have failed\n");
}
13 changes: 12 additions & 1 deletion include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -1389,6 +1389,7 @@ struct bpf_prog_aux {
u32 stack_depth;
u32 id;
u32 func_cnt; /* used by non-func prog as the number of func progs */
u32 real_func_cnt; /* includes hidden progs, only used for JIT and freeing progs */
u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
u32 attach_btf_id; /* in-kernel BTF type id to attach to */
u32 ctx_arg_info_size;
Expand All @@ -1409,6 +1410,8 @@ struct bpf_prog_aux {
bool sleepable;
bool tail_call_reachable;
bool xdp_has_frags;
bool exception_cb;
bool exception_boundary;
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
const struct btf_type *attach_func_proto;
/* function name for valid attach_btf_id */
Expand All @@ -1431,6 +1434,7 @@ struct bpf_prog_aux {
int cgroup_atype; /* enum cgroup_bpf_attach_type */
struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
char name[BPF_OBJ_NAME_LEN];
unsigned int (*bpf_exception_cb)(u64 cookie, u64 sp, u64 bp);
#ifdef CONFIG_SECURITY
void *security;
#endif
Expand Down Expand Up @@ -2418,9 +2422,11 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog,
int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *regs);
int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
struct bpf_reg_state *reg);
struct bpf_reg_state *reg, bool is_ex_cb);
int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
struct btf *btf, const struct btf_type *t);
const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt,
int comp_idx, const char *tag_key);

struct bpf_prog *bpf_prog_by_id(u32 id);
struct bpf_link *bpf_link_by_id(u32 id);
Expand Down Expand Up @@ -3194,4 +3200,9 @@ static inline gfp_t bpf_memcg_flags(gfp_t flags)
return flags;
}

static inline bool bpf_is_subprog(const struct bpf_prog *prog)
{
return prog->aux->func_idx != 0;
}

#endif /* _LINUX_BPF_H */
8 changes: 7 additions & 1 deletion include/linux/bpf_verifier.h
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,7 @@ struct bpf_func_state {
bool in_callback_fn;
struct tnum callback_ret_range;
bool in_async_callback_fn;
bool in_exception_callback_fn;

/* The following fields should be last. See copy_func_state() */
int acquired_refs;
Expand Down Expand Up @@ -541,7 +542,9 @@ struct bpf_subprog_info {
bool has_tail_call;
bool tail_call_reachable;
bool has_ld_abs;
bool is_cb;
bool is_async_cb;
bool is_exception_cb;
};

struct bpf_verifier_env;
Expand Down Expand Up @@ -588,17 +591,20 @@ struct bpf_verifier_env {
u32 used_map_cnt; /* number of used maps */
u32 used_btf_cnt; /* number of used BTF objects */
u32 id_gen; /* used to generate unique reg IDs */
u32 hidden_subprog_cnt; /* number of hidden subprogs */
int exception_callback_subprog;
bool explore_alu_limits;
bool allow_ptr_leaks;
bool allow_uninit_stack;
bool bpf_capable;
bool bypass_spec_v1;
bool bypass_spec_v4;
bool seen_direct_write;
bool seen_exception;
struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
const struct bpf_line_info *prev_linfo;
struct bpf_verifier_log log;
struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 1];
struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 2]; /* max + 2 for the fake and exception subprogs */
union {
struct bpf_idmap idmap_scratch;
struct bpf_idset idset_scratch;
Expand Down
8 changes: 8 additions & 0 deletions include/linux/filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -954,6 +954,8 @@ bool bpf_jit_needs_zext(void);
bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_kfunc_call(void);
bool bpf_jit_supports_far_kfunc_call(void);
bool bpf_jit_supports_exceptions(void);
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
bool bpf_helper_changes_pkt_data(void *func);

static inline bool bpf_dump_raw_ok(const struct cred *cred)
Expand Down Expand Up @@ -1169,6 +1171,7 @@ const char *__bpf_address_lookup(unsigned long addr, unsigned long *size,
bool is_bpf_text_address(unsigned long addr);
int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
char *sym);
struct bpf_prog *bpf_prog_ksym_find(unsigned long addr);

static inline const char *
bpf_address_lookup(unsigned long addr, unsigned long *size,
Expand Down Expand Up @@ -1236,6 +1239,11 @@ static inline int bpf_get_kallsym(unsigned int symnum, unsigned long *value,
return -ERANGE;
}

static inline struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
{
return NULL;
}

static inline const char *
bpf_address_lookup(unsigned long addr, unsigned long *size,
unsigned long *off, char **modname, char *sym)
Expand Down
2 changes: 2 additions & 0 deletions include/linux/kasan.h
Original file line number Diff line number Diff line change
Expand Up @@ -283,8 +283,10 @@ static inline bool kasan_check_byte(const void *address)

#if defined(CONFIG_KASAN) && defined(CONFIG_KASAN_STACK)
void kasan_unpoison_task_stack(struct task_struct *task);
asmlinkage void kasan_unpoison_task_stack_below(const void *watermark);
#else
static inline void kasan_unpoison_task_stack(struct task_struct *task) {}
static inline void kasan_unpoison_task_stack_below(const void *watermark) {}
#endif

#ifdef CONFIG_KASAN_GENERIC
Expand Down
Loading

0 comments on commit ec6f1b4

Please sign in to comment.