Skip to content

Commit

Permalink
Merge branch 'tcp-bpf-cc'
Browse files Browse the repository at this point in the history
Martin Lau says:

====================
This series introduces BPF STRUCT_OPS.  It is an infra to allow
implementing some specific kernel's function pointers in BPF.
The first use case included in this series is to implement
TCP congestion control algorithm in BPF  (i.e. implement
struct tcp_congestion_ops in BPF).

There has been attempt to move the TCP CC to the user space
(e.g. CCP in TCP).   The common arguments are faster turn around,
get away from long-tail kernel versions in production...etc,
which are legit points.

BPF has been the continuous effort to join both kernel and
userspace upsides together (e.g. XDP to gain the performance
advantage without bypassing the kernel).  The recent BPF
advancements (in particular BTF-aware verifier, BPF trampoline,
BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc)
possible in BPF.

The idea is to allow implementing tcp_congestion_ops in bpf.
It allows a faster turnaround for testing algorithm in the
production while leveraging the existing (and continue growing) BPF
feature/framework instead of building one specifically for
userspace TCP CC.

Please see individual patch for details.

The bpftool support will be posted in follow-up patches.

v4:
- Expose tcp_ca_find() to tcp.h in patch 7.
  It is used to check the same bpf-tcp-cc
  does not exist to guarantee the register()
  will succeed.
- set_memory_ro() and then set_memory_x() only after all
  trampolines are written to the image in patch 6. (Daniel)
  spinlock is replaced by mutex because set_memory_*
  requires sleepable context.

v3:
- Fix kbuild error by considering CONFIG_BPF_SYSCALL (kbuild)
- Support anonymous bitfield in patch 4 (Andrii, Yonghong)
- Push boundary safety check to a specific arch's trampoline function
  (in patch 6) (Yonghong).
  Reuse the WANR_ON_ONCE check in arch_prepare_bpf_trampoline() in x86.
- Check module field is 0 in udata in patch 6 (Yonghong)
- Check zero holes in patch 6 (Andrii)
- s/_btf_vmlinux/btf/ in patch 5 and 7 (Andrii)
- s/check_xxx/is_xxx/ in patch 7 (Andrii)
- Use "struct_ops/" convention in patch 11 (Andrii)
- Use the skel instead of bpf_object in patch 11 (Andrii)
- libbpf: Decide BPF_PROG_TYPE_STRUCT_OPS at open phase by using
          find_sec_def()
- libbpf: Avoid a debug message at open phase (Andrii)
- libbpf: Add bpf_program__(is|set)_struct_ops() for consistency (Andrii)
- libbpf: Add "struct_ops" to section_defs (Andrii)
- libbpf: Some code shuffling in init_kern_struct_ops() (Andrii)
- libbpf: A few safety checks (Andrii)

v2:
- Dropped cubic for now.  They will be reposted
  once there are more clarity in "jiffies" on both
  bpf side (about the helper) and
  tcp_cubic side (some of jiffies usages are being replaced
  by tp->tcp_mstamp)
- Remove unnecssary check on bitfield support from btf_struct_access()
  (Yonghong)
- BTF_TYPE_EMIT macro (Yonghong, Andrii)
- value_name's length check to avoid an unlikely
  type match during truncation case (Yonghong)
- BUILD_BUG_ON to ensure no trampoline-image overrun
  in the future (Yonghong)
- Simplify get_next_key() (Yonghong)
- Added comment to explain how to check mandatory
  func ptr in net/ipv4/bpf_tcp_ca.c (Yonghong)
- Rename "__bpf_" to "bpf_struct_ops_" for value prefix (Andrii)
- Add comment to highlight the bpf_dctcp.c is not necessarily
  the same as tcp_dctcp.c. (Alexei, Eric)
- libbpf: Renmae "struct_ops" to ".struct_ops" for elf sec (Andrii)
- libbpf: Expose struct_ops as a bpf_map (Andrii)
- libbpf: Support multiple struct_ops in SEC(".struct_ops") (Andrii)
- libbpf: Add bpf_map__attach_struct_ops()  (Andrii)
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  • Loading branch information
Alexei Starovoitov committed Jan 9, 2020
2 parents 65726b5 + 0990386 commit 417759f
Show file tree
Hide file tree
Showing 32 changed files with 2,654 additions and 148 deletions.
18 changes: 8 additions & 10 deletions arch/x86/net/bpf_jit_comp.c
Original file line number Diff line number Diff line change
Expand Up @@ -1328,7 +1328,7 @@ xadd: if (is_imm8(insn->off))
return proglen;
}

static void save_regs(struct btf_func_model *m, u8 **prog, int nr_args,
static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_args,
int stack_size)
{
int i;
Expand All @@ -1344,7 +1344,7 @@ static void save_regs(struct btf_func_model *m, u8 **prog, int nr_args,
-(stack_size - i * 8));
}

static void restore_regs(struct btf_func_model *m, u8 **prog, int nr_args,
static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_args,
int stack_size)
{
int i;
Expand All @@ -1361,7 +1361,7 @@ static void restore_regs(struct btf_func_model *m, u8 **prog, int nr_args,
-(stack_size - i * 8));
}

static int invoke_bpf(struct btf_func_model *m, u8 **pprog,
static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
struct bpf_prog **progs, int prog_cnt, int stack_size)
{
u8 *prog = *pprog;
Expand Down Expand Up @@ -1456,7 +1456,8 @@ static int invoke_bpf(struct btf_func_model *m, u8 **pprog,
* add rsp, 8 // skip eth_type_trans's frame
* ret // return to its caller
*/
int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
int arch_prepare_bpf_trampoline(void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
struct bpf_prog **fentry_progs, int fentry_cnt,
struct bpf_prog **fexit_progs, int fexit_cnt,
void *orig_call)
Expand Down Expand Up @@ -1523,13 +1524,10 @@ int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags
/* skip our return address and return to parent */
EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
EMIT1(0xC3); /* ret */
/* One half of the page has active running trampoline.
* Another half is an area for next trampoline.
* Make sure the trampoline generation logic doesn't overflow.
*/
if (WARN_ON_ONCE(prog - (u8 *)image > PAGE_SIZE / 2 - BPF_INSN_SAFETY))
/* Make sure the trampoline generation logic doesn't overflow */
if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY))
return -EFAULT;
return 0;
return prog - (u8 *)image;
}

static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond)
Expand Down
79 changes: 77 additions & 2 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <linux/u64_stats_sync.h>
#include <linux/refcount.h>
#include <linux/mutex.h>
#include <linux/module.h>

struct bpf_verifier_env;
struct bpf_verifier_log;
Expand Down Expand Up @@ -106,6 +107,7 @@ struct bpf_map {
struct btf *btf;
struct bpf_map_memory memory;
char name[BPF_OBJ_NAME_LEN];
u32 btf_vmlinux_value_type_id;
bool unpriv_array;
bool frozen; /* write-once; write-protected by freeze_mutex */
/* 22 bytes hole */
Expand Down Expand Up @@ -183,7 +185,8 @@ static inline bool bpf_map_offload_neutral(const struct bpf_map *map)

static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
{
return map->btf && map->ops->map_seq_show_elem;
return (map->btf_value_type_id || map->btf_vmlinux_value_type_id) &&
map->ops->map_seq_show_elem;
}

int map_check_no_btf(const struct bpf_map *map,
Expand Down Expand Up @@ -349,6 +352,10 @@ struct bpf_verifier_ops {
const struct bpf_insn *src,
struct bpf_insn *dst,
struct bpf_prog *prog, u32 *target_size);
int (*btf_struct_access)(struct bpf_verifier_log *log,
const struct btf_type *t, int off, int size,
enum bpf_access_type atype,
u32 *next_btf_id);
};

struct bpf_prog_offload_ops {
Expand Down Expand Up @@ -437,7 +444,8 @@ struct btf_func_model {
* fentry = a set of program to run before calling original function
* fexit = a set of program to run after original function
*/
int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
int arch_prepare_bpf_trampoline(void *image, void *image_end,
const struct btf_func_model *m, u32 flags,
struct bpf_prog **fentry_progs, int fentry_cnt,
struct bpf_prog **fexit_progs, int fexit_cnt,
void *orig_call);
Expand Down Expand Up @@ -668,6 +676,73 @@ struct bpf_array_aux {
struct work_struct work;
};

struct bpf_struct_ops_value;
struct btf_type;
struct btf_member;

#define BPF_STRUCT_OPS_MAX_NR_MEMBERS 64
struct bpf_struct_ops {
const struct bpf_verifier_ops *verifier_ops;
int (*init)(struct btf *btf);
int (*check_member)(const struct btf_type *t,
const struct btf_member *member);
int (*init_member)(const struct btf_type *t,
const struct btf_member *member,
void *kdata, const void *udata);
int (*reg)(void *kdata);
void (*unreg)(void *kdata);
const struct btf_type *type;
const struct btf_type *value_type;
const char *name;
struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS];
u32 type_id;
u32 value_id;
};

#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
#define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id);
void bpf_struct_ops_init(struct btf *btf);
bool bpf_struct_ops_get(const void *kdata);
void bpf_struct_ops_put(const void *kdata);
int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
void *value);
static inline bool bpf_try_module_get(const void *data, struct module *owner)
{
if (owner == BPF_MODULE_OWNER)
return bpf_struct_ops_get(data);
else
return try_module_get(owner);
}
static inline void bpf_module_put(const void *data, struct module *owner)
{
if (owner == BPF_MODULE_OWNER)
bpf_struct_ops_put(data);
else
module_put(owner);
}
#else
static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
{
return NULL;
}
static inline void bpf_struct_ops_init(struct btf *btf) { }
static inline bool bpf_try_module_get(const void *data, struct module *owner)
{
return try_module_get(owner);
}
static inline void bpf_module_put(const void *data, struct module *owner)
{
module_put(owner);
}
static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map,
void *key,
void *value)
{
return -EINVAL;
}
#endif

struct bpf_array {
struct bpf_map map;
u32 elem_size;
Expand Down
7 changes: 7 additions & 0 deletions include/linux/bpf_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
struct sk_reuseport_md, struct sk_reuseport_kern)
#endif
#if defined(CONFIG_BPF_JIT)
BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
void *, void *)
#endif

BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
Expand Down Expand Up @@ -105,3 +109,6 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
#if defined(CONFIG_BPF_JIT)
BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
#endif
47 changes: 47 additions & 0 deletions include/linux/btf.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
#include <linux/types.h>
#include <uapi/linux/btf.h>

#define BTF_TYPE_EMIT(type) ((void)(type *)0)

struct btf;
struct btf_member;
struct btf_type;
Expand Down Expand Up @@ -53,6 +55,22 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
u32 expected_offset, u32 expected_size);
int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
bool btf_type_is_void(const struct btf_type *t);
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
u32 id, u32 *res_id);
const struct btf_type *btf_type_resolve_ptr(const struct btf *btf,
u32 id, u32 *res_id);
const struct btf_type *btf_type_resolve_func_ptr(const struct btf *btf,
u32 id, u32 *res_id);
const struct btf_type *
btf_resolve_size(const struct btf *btf, const struct btf_type *type,
u32 *type_size, const struct btf_type **elem_type,
u32 *total_nelems);

#define for_each_member(i, struct_type, member) \
for (i = 0, member = btf_type_member(struct_type); \
i < btf_type_vlen(struct_type); \
i++, member++)

static inline bool btf_type_is_ptr(const struct btf_type *t)
{
Expand Down Expand Up @@ -84,6 +102,35 @@ static inline bool btf_type_is_func_proto(const struct btf_type *t)
return BTF_INFO_KIND(t->info) == BTF_KIND_FUNC_PROTO;
}

static inline u16 btf_type_vlen(const struct btf_type *t)
{
return BTF_INFO_VLEN(t->info);
}

static inline bool btf_type_kflag(const struct btf_type *t)
{
return BTF_INFO_KFLAG(t->info);
}

static inline u32 btf_member_bit_offset(const struct btf_type *struct_type,
const struct btf_member *member)
{
return btf_type_kflag(struct_type) ? BTF_MEMBER_BIT_OFFSET(member->offset)
: member->offset;
}

static inline u32 btf_member_bitfield_size(const struct btf_type *struct_type,
const struct btf_member *member)
{
return btf_type_kflag(struct_type) ? BTF_MEMBER_BITFIELD_SIZE(member->offset)
: 0;
}

static inline const struct btf_member *btf_type_member(const struct btf_type *t)
{
return (const struct btf_member *)(t + 1);
}

#ifdef CONFIG_BPF_SYSCALL
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
const char *btf_name_by_offset(const struct btf *btf, u32 offset);
Expand Down
2 changes: 2 additions & 0 deletions include/linux/filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -843,6 +843,8 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog);
int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
bpf_aux_classic_check_t trans, bool save_orig);
void bpf_prog_destroy(struct bpf_prog *fp);
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id);

int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
int sk_attach_bpf(u32 ufd, struct sock *sk);
Expand Down
2 changes: 2 additions & 0 deletions include/net/tcp.h
Original file line number Diff line number Diff line change
Expand Up @@ -1007,6 +1007,7 @@ enum tcp_ca_ack_event_flags {
#define TCP_CONG_NON_RESTRICTED 0x1
/* Requires ECN/ECT set on all packets */
#define TCP_CONG_NEEDS_ECN 0x2
#define TCP_CONG_MASK (TCP_CONG_NON_RESTRICTED | TCP_CONG_NEEDS_ECN)

union tcp_cc_info;

Expand Down Expand Up @@ -1101,6 +1102,7 @@ u32 tcp_reno_undo_cwnd(struct sock *sk);
void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked);
extern struct tcp_congestion_ops tcp_reno;

struct tcp_congestion_ops *tcp_ca_find(const char *name);
struct tcp_congestion_ops *tcp_ca_find_key(u32 key);
u32 tcp_ca_get_key_by_name(struct net *net, const char *name, bool *ecn_ca);
#ifdef CONFIG_INET
Expand Down
19 changes: 17 additions & 2 deletions include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_STACK,
BPF_MAP_TYPE_SK_STORAGE,
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
};

/* Note that tracing related programs such as
Expand Down Expand Up @@ -174,6 +175,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
BPF_PROG_TYPE_CGROUP_SOCKOPT,
BPF_PROG_TYPE_TRACING,
BPF_PROG_TYPE_STRUCT_OPS,
};

enum bpf_attach_type {
Expand Down Expand Up @@ -397,6 +399,10 @@ union bpf_attr {
__u32 btf_fd; /* fd pointing to a BTF type data */
__u32 btf_key_type_id; /* BTF type_id of the key */
__u32 btf_value_type_id; /* BTF type_id of the value */
__u32 btf_vmlinux_value_type_id;/* BTF type_id of a kernel-
* struct stored as the
* map value
*/
};

struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
Expand Down Expand Up @@ -2831,6 +2837,14 @@ union bpf_attr {
* Return
* On success, the strictly positive length of the string, including
* the trailing NUL character. On error, a negative value.
*
* int bpf_tcp_send_ack(void *tp, u32 rcv_nxt)
* Description
* Send out a tcp-ack. *tp* is the in-kernel struct tcp_sock.
* *rcv_nxt* is the ack_seq to be sent out.
* Return
* 0 on success, or a negative error in case of failure.
*
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
Expand Down Expand Up @@ -2948,7 +2962,8 @@ union bpf_attr {
FN(probe_read_user), \
FN(probe_read_kernel), \
FN(probe_read_user_str), \
FN(probe_read_kernel_str),
FN(probe_read_kernel_str), \
FN(tcp_send_ack),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
Expand Down Expand Up @@ -3349,7 +3364,7 @@ struct bpf_map_info {
__u32 map_flags;
char name[BPF_OBJ_NAME_LEN];
__u32 ifindex;
__u32 :32;
__u32 btf_vmlinux_value_type_id;
__u64 netns_dev;
__u64 netns_ino;
__u32 btf_id;
Expand Down
3 changes: 3 additions & 0 deletions kernel/bpf/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,6 @@ endif
ifeq ($(CONFIG_SYSFS),y)
obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
endif
ifeq ($(CONFIG_BPF_JIT),y)
obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
endif
Loading

0 comments on commit 417759f

Please sign in to comment.