Skip to content

Commit

Permalink
Merge branch 'bpf-token'
Browse files Browse the repository at this point in the history
Andrii Nakryiko says:

====================
BPF token

This patch set is a combination of three BPF token-related patch sets ([0],
[1], [2]) with fixes ([3]) to kernel-side token_fd passing APIs incorporated
into relevant patches, bpf_token_capable() changes requested by
Christian Brauner, and necessary libbpf and BPF selftests side adjustments.

This patch set introduces an ability to delegate a subset of BPF subsystem
functionality from privileged system-wide daemon (e.g., systemd or any other
container manager) through special mount options for userns-bound BPF FS to
a *trusted* unprivileged application. Trust is the key here. This
functionality is not about allowing unconditional unprivileged BPF usage.
Establishing trust, though, is completely up to the discretion of respective
privileged application that would create and mount a BPF FS instance with
delegation enabled, as different production setups can and do achieve it
through a combination of different means (signing, LSM, code reviews, etc),
and it's undesirable and infeasible for kernel to enforce any particular way
of validating trustworthiness of particular process.

The main motivation for this work is a desire to enable containerized BPF
applications to be used together with user namespaces. This is currently
impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced
or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF
helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read
arbitrary memory, and it's impossible to ensure that they only read memory of
processes belonging to any given namespace. This means that it's impossible to
have a mechanically verifiable namespace-aware CAP_BPF capability, and as such
another mechanism to allow safe usage of BPF functionality is necessary.

BPF FS delegation mount options and BPF token derived from such BPF FS instance
is such a mechanism. Kernel makes no assumption about what "trusted"
constitutes in any particular case, and it's up to specific privileged
applications and their surrounding infrastructure to decide that. What kernel
provides is a set of APIs to setup and mount special BPF FS instance and
derive BPF tokens from it. BPF FS and BPF token are both bound to its owning
userns and in such a way are constrained inside intended container. Users can
then pass BPF token FD to privileged bpf() syscall commands, like BPF map
creation and BPF program loading, to perform such operations without having
init userns privileges.

This version incorporates feedback and suggestions ([4]) received on earlier
iterations of BPF token approach, and instead of allowing to create BPF tokens
directly assuming capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accept
a few new delegation mount options. If these options are used and BPF FS itself
is properly created, set up, and mounted inside the user namespaced container,
user application is able to derive a BPF token object from BPF FS instance, and
pass that token to bpf() syscall. As explained in patch #3, BPF token itself
doesn't grant access to BPF functionality, but instead allows kernel to do
namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF,
CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one
half of a puzzle and allows container managers and sys admins to have safe and
flexible configuration options: determining which containers get delegation of
BPF functionality through BPF FS, and then which applications within such
containers are allowed to perform bpf() commands, based on namespaces
capabilities.

Previous attempt at addressing this very same problem ([5]) attempted to
utilize authoritative LSM approach, but was conclusively rejected by upstream
LSM maintainers. BPF token concept is not changing anything about LSM
approach, but can be combined with LSM hooks for very fine-grained security
policy. Some ideas about making BPF token more convenient to use with LSM (in
particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF
2023 presentation ([6]). E.g., an ability to specify user-provided data
(context), which in combination with BPF LSM would allow implementing a very
dynamic and fine-granular custom security policies on top of BPF token. In the
interest of minimizing API surface area and discussions this was relegated to
follow up patches, as it's not essential to the fundamental concept of
delegatable BPF token.

It should be noted that BPF token is conceptually quite similar to the idea of
/dev/bpf device file, proposed by Song a while ago ([7]). The biggest
difference is the idea of using virtual anon_inode file to hold BPF token and
allowing multiple independent instances of them, each (potentially) with its
own set of restrictions. And also, crucially, BPF token approach is not using
any special stateful task-scoped flags. Instead, bpf() syscall accepts
token_fd parameters explicitly for each relevant BPF command. This addresses
main concerns brought up during the /dev/bpf discussion, and fits better with
overall BPF subsystem design.

Second part of this patch set adds full support for BPF token in libbpf's BPF
object high-level API. Good chunk of the changes rework libbpf feature
detection internals, which are the most affected by BPF token presence.

Besides internal refactorings, libbpf allows to pass location of BPF FS from
which BPF token should be created by libbpf. This can be done explicitly though
a new bpf_object_open_opts.bpf_token_path field. But we also add implicit BPF
token creation logic to BPF object load step, even without any explicit
involvement of the user. If the environment is setup properly, BPF token will
be created transparently and used implicitly. This allows for all existing
application to gain BPF token support by just linking with latest version of
libbpf library. No source code modifications are required.  All that under
assumption that privileged container management agent properly set up default
BPF FS instance at /sys/bpf/fs to allow BPF token creation.

libbpf adds support to override default BPF FS location for BPF token creation
through LIBBPF_BPF_TOKEN_PATH envvar knowledge. This allows admins or container
managers to mount BPF token-enabled BPF FS at non-standard location without the
need to coordinate with applications.  LIBBPF_BPF_TOKEN_PATH can also be used
to disable BPF token implicit creation by setting it to an empty value.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=805707&state=*
  [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810260&state=*
  [2] https://patchwork.kernel.org/project/netdevbpf/list/?series=809800&state=*
  [3] https://patchwork.kernel.org/project/netdevbpf/patch/20231219053150.336991-1-andrii@kernel.org/
  [4] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
  [5] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/
  [6] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf
  [7] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/

v1->v2:
  - disable BPF token creation in init userns, and simplify
    bpf_token_capable() logic (Christian);
  - use kzalloc/kfree instead of kvzalloc/kvfree (Linus);
  - few more selftest cases to validate LSM and BPF token interations.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
====================

Link: https://lore.kernel.org/r/20240124022127.2379740-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  • Loading branch information
Andrii Nakryiko authored and Alexei Starovoitov committed Jan 25, 2024
2 parents c9f1155 + 906ee42 commit c8632ac
Show file tree
Hide file tree
Showing 41 changed files with 2,982 additions and 644 deletions.
2 changes: 1 addition & 1 deletion drivers/media/rc/bpf-lirc.c
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_prandom_u32:
return &bpf_get_prandom_u32_proto;
case BPF_FUNC_trace_printk:
if (perfmon_capable())
if (bpf_token_capable(prog->aux->token, CAP_PERFMON))
return bpf_get_trace_printk_proto();
fallthrough;
default:
Expand Down
85 changes: 75 additions & 10 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ struct module;
struct bpf_func_state;
struct ftrace_ops;
struct cgroup;
struct bpf_token;
struct user_namespace;
struct super_block;
struct inode;

extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
Expand Down Expand Up @@ -1485,6 +1489,7 @@ struct bpf_prog_aux {
#ifdef CONFIG_SECURITY
void *security;
#endif
struct bpf_token *token;
struct bpf_prog_offload *offload;
struct btf *btf;
struct bpf_func_info *func_info;
Expand Down Expand Up @@ -1609,6 +1614,31 @@ struct bpf_link_primer {
u32 id;
};

struct bpf_mount_opts {
kuid_t uid;
kgid_t gid;
umode_t mode;

/* BPF token-related delegation options */
u64 delegate_cmds;
u64 delegate_maps;
u64 delegate_progs;
u64 delegate_attachs;
};

struct bpf_token {
struct work_struct work;
atomic64_t refcnt;
struct user_namespace *userns;
u64 allowed_cmds;
u64 allowed_maps;
u64 allowed_progs;
u64 allowed_attachs;
#ifdef CONFIG_SECURITY
void *security;
#endif
};

struct bpf_struct_ops_value;
struct btf_member;

Expand Down Expand Up @@ -2097,6 +2127,7 @@ static inline void bpf_enable_instrumentation(void)
migrate_enable();
}

extern const struct super_operations bpf_super_ops;
extern const struct file_operations bpf_map_fops;
extern const struct file_operations bpf_prog_fops;
extern const struct file_operations bpf_iter_fops;
Expand Down Expand Up @@ -2231,24 +2262,26 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map)

extern int sysctl_unprivileged_bpf_disabled;

static inline bool bpf_allow_ptr_leaks(void)
bool bpf_token_capable(const struct bpf_token *token, int cap);

static inline bool bpf_allow_ptr_leaks(const struct bpf_token *token)
{
return perfmon_capable();
return bpf_token_capable(token, CAP_PERFMON);
}

static inline bool bpf_allow_uninit_stack(void)
static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
{
return perfmon_capable();
return bpf_token_capable(token, CAP_PERFMON);
}

static inline bool bpf_bypass_spec_v1(void)
static inline bool bpf_bypass_spec_v1(const struct bpf_token *token)
{
return cpu_mitigations_off() || perfmon_capable();
return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
}

static inline bool bpf_bypass_spec_v4(void)
static inline bool bpf_bypass_spec_v4(const struct bpf_token *token)
{
return cpu_mitigations_off() || perfmon_capable();
return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
}

int bpf_map_new_fd(struct bpf_map *map, int flags);
Expand All @@ -2265,8 +2298,21 @@ int bpf_link_new_fd(struct bpf_link *link);
struct bpf_link *bpf_link_get_from_fd(u32 ufd);
struct bpf_link *bpf_link_get_curr_or_next(u32 *id);

void bpf_token_inc(struct bpf_token *token);
void bpf_token_put(struct bpf_token *token);
int bpf_token_create(union bpf_attr *attr);
struct bpf_token *bpf_token_get_from_fd(u32 ufd);

bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
bool bpf_token_allow_prog_type(const struct bpf_token *token,
enum bpf_prog_type prog_type,
enum bpf_attach_type attach_type);

int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
struct inode *bpf_get_inode(struct super_block *sb, const struct inode *dir,
umode_t mode);

#define BPF_ITER_FUNC_PREFIX "bpf_iter_"
#define DEFINE_BPF_ITER_FUNC(target, args...) \
Expand Down Expand Up @@ -2507,7 +2553,8 @@ int btf_find_next_decl_tag(const struct btf *btf, const struct btf_type *pt,
struct bpf_prog *bpf_prog_by_id(u32 id);
struct bpf_link *bpf_link_by_id(u32 id);

const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
const struct bpf_prog *prog);
void bpf_task_storage_free(struct task_struct *task);
void bpf_cgrp_storage_free(struct cgroup *cgroup);
bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
Expand Down Expand Up @@ -2626,6 +2673,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
return -EOPNOTSUPP;
}

static inline bool bpf_token_capable(const struct bpf_token *token, int cap)
{
return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
}

static inline void bpf_token_inc(struct bpf_token *token)
{
}

static inline void bpf_token_put(struct bpf_token *token)
{
}

static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
{
return ERR_PTR(-EOPNOTSUPP);
}

static inline void __dev_flush(void)
{
}
Expand Down Expand Up @@ -2749,7 +2814,7 @@ static inline int btf_struct_access(struct bpf_verifier_log *log,
}

static inline const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
return NULL;
}
Expand Down
2 changes: 1 addition & 1 deletion include/linux/filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -1140,7 +1140,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
return false;
if (!bpf_jit_harden)
return false;
if (bpf_jit_harden == 1 && bpf_capable())
if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
return false;

return true;
Expand Down
15 changes: 11 additions & 4 deletions include/linux/lsm_hook_defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -404,10 +404,17 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free_security, struct bpf_prog_aux *aux)
LSM_HOOK(int, 0, bpf_map_create, struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token)
LSM_HOOK(void, LSM_RET_VOID, bpf_map_free, struct bpf_map *map)
LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token)
LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
LSM_HOOK(int, 0, bpf_token_create, struct bpf_token *token, union bpf_attr *attr,
struct path *path)
LSM_HOOK(void, LSM_RET_VOID, bpf_token_free, struct bpf_token *token)
LSM_HOOK(int, 0, bpf_token_cmd, const struct bpf_token *token, enum bpf_cmd cmd)
LSM_HOOK(int, 0, bpf_token_capable, const struct bpf_token *token, int cap)
#endif /* CONFIG_BPF_SYSCALL */

LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)
Expand Down
43 changes: 36 additions & 7 deletions include/linux/security.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/sockptr.h>
#include <linux/bpf.h>
#include <uapi/linux/lsm.h>

struct linux_binprm;
Expand Down Expand Up @@ -2064,15 +2065,22 @@ static inline void securityfs_remove(struct dentry *dentry)
union bpf_attr;
struct bpf_map;
struct bpf_prog;
struct bpf_prog_aux;
struct bpf_token;
#ifdef CONFIG_SECURITY
extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
extern int security_bpf_prog(struct bpf_prog *prog);
extern int security_bpf_map_alloc(struct bpf_map *map);
extern int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token);
extern void security_bpf_map_free(struct bpf_map *map);
extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token);
extern void security_bpf_prog_free(struct bpf_prog *prog);
extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
struct path *path);
extern void security_bpf_token_free(struct bpf_token *token);
extern int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
#else
static inline int security_bpf(int cmd, union bpf_attr *attr,
unsigned int size)
Expand All @@ -2090,21 +2098,42 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
return 0;
}

static inline int security_bpf_map_alloc(struct bpf_map *map)
static inline int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token)
{
return 0;
}

static inline void security_bpf_map_free(struct bpf_map *map)
{ }

static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token)
{
return 0;
}

static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
static inline void security_bpf_prog_free(struct bpf_prog *prog)
{ }

static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
struct path *path)
{
return 0;
}

static inline void security_bpf_token_free(struct bpf_token *token)
{ }

static inline int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
{
return 0;
}

static inline int security_bpf_token_capable(const struct bpf_token *token, int cap)
{
return 0;
}
#endif /* CONFIG_SECURITY */
#endif /* CONFIG_BPF_SYSCALL */

Expand Down
55 changes: 55 additions & 0 deletions include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -847,6 +847,36 @@ union bpf_iter_link_info {
* Returns zero on success. On error, -1 is returned and *errno*
* is set appropriately.
*
* BPF_TOKEN_CREATE
* Description
* Create BPF token with embedded information about what
* BPF-related functionality it allows:
* - a set of allowed bpf() syscall commands;
* - a set of allowed BPF map types to be created with
* BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
* - a set of allowed BPF program types and BPF program attach
* types to be loaded with BPF_PROG_LOAD command, if
* BPF_PROG_LOAD itself is allowed.
*
* BPF token is created (derived) from an instance of BPF FS,
* assuming it has necessary delegation mount options specified.
* This BPF token can be passed as an extra parameter to various
* bpf() syscall commands to grant BPF subsystem functionality to
* unprivileged processes.
*
* When created, BPF token is "associated" with the owning
* user namespace of BPF FS instance (super block) that it was
* derived from, and subsequent BPF operations performed with
* BPF token would be performing capabilities checks (i.e.,
* CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
* that user namespace. Without BPF token, such capabilities
* have to be granted in init user namespace, making bpf()
* syscall incompatible with user namespace, for the most part.
*
* Return
* A new file descriptor (a nonnegative integer), or -1 if an
* error occurred (in which case, *errno* is set appropriately).
*
* NOTES
* eBPF objects (maps and programs) can be shared between processes.
*
Expand Down Expand Up @@ -901,6 +931,8 @@ enum bpf_cmd {
BPF_ITER_CREATE,
BPF_LINK_DETACH,
BPF_PROG_BIND_MAP,
BPF_TOKEN_CREATE,
__MAX_BPF_CMD,
};

enum bpf_map_type {
Expand Down Expand Up @@ -951,6 +983,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_BLOOM_FILTER,
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
__MAX_BPF_MAP_TYPE
};

/* Note that tracing related programs such as
Expand Down Expand Up @@ -995,6 +1028,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_LOOKUP,
BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
BPF_PROG_TYPE_NETFILTER,
__MAX_BPF_PROG_TYPE
};

enum bpf_attach_type {
Expand Down Expand Up @@ -1333,6 +1367,9 @@ enum {

/* Flag for value_type_btf_obj_fd, the fd is available */
BPF_F_VTYPE_BTF_OBJ_FD = (1U << 15),

/* BPF token FD is passed in a corresponding command's token_fd field */
BPF_F_TOKEN_FD = (1U << 16),
};

/* Flags for BPF_PROG_QUERY. */
Expand Down Expand Up @@ -1411,6 +1448,10 @@ union bpf_attr {
* type data for
* btf_vmlinux_value_type_id.
*/
/* BPF token FD to use with BPF_MAP_CREATE operation.
* If provided, map_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 map_token_fd;
};

struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
Expand Down Expand Up @@ -1480,6 +1521,10 @@ union bpf_attr {
* truncated), or smaller (if log buffer wasn't filled completely).
*/
__u32 log_true_size;
/* BPF token FD to use with BPF_PROG_LOAD operation.
* If provided, prog_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 prog_token_fd;
};

struct { /* anonymous struct used by BPF_OBJ_* commands */
Expand Down Expand Up @@ -1592,6 +1637,11 @@ union bpf_attr {
* truncated), or smaller (if log buffer wasn't filled completely).
*/
__u32 btf_log_true_size;
__u32 btf_flags;
/* BPF token FD to use with BPF_BTF_LOAD operation.
* If provided, btf_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 btf_token_fd;
};

struct {
Expand Down Expand Up @@ -1722,6 +1772,11 @@ union bpf_attr {
__u32 flags; /* extra flags */
} prog_bind_map;

struct { /* struct used by BPF_TOKEN_CREATE command */
__u32 flags;
__u32 bpffs_fd;
} token_create;

} __attribute__((aligned(8)));

/* The description below is an attempt at providing documentation to eBPF
Expand Down
Loading

0 comments on commit c8632ac

Please sign in to comment.