Skip to content

Commit

Permalink
Merge branch 'bpf-sk-lookup'
Browse files Browse the repository at this point in the history
Joe Stringer says:

====================
This series proposes a new helper for the BPF API which allows BPF programs to
perform lookups for sockets in a network namespace. This would allow programs
to determine early on in processing whether the stack is expecting to receive
the packet, and perform some action (eg drop, forward somewhere) based on this
information.

The series is structured roughly into:
* Misc refactor
* Add the socket pointer type
* Add reference tracking to ensure that socket references are freed
* Extend the BPF API to add sk_lookup_xxx() / sk_release() functions
* Add tests/documentation

The helper proposed in this series includes a parameter for a tuple which must
be filled in by the caller to determine the socket to look up. The simplest
case would be filling with the contents of the packet, ie mapping the packet's
5-tuple into the parameter. In common cases, it may alternatively be useful to
reverse the direction of the tuple and perform a lookup, to find the socket
that initiates this connection; and if the BPF program ever performs a form of
IP address translation, it may further be useful to be able to look up
arbitrary tuples that are not based upon the packet, but instead based on state
held in BPF maps or hardcoded in the BPF program.

Currently, access into the socket's fields are limited to those which are
otherwise already accessible, and are restricted to read-only access.

Changes since v3:
* New patch: "bpf: Reuse canonical string formatter for ctx errs"
* Add PTR_TO_SOCKET to is_ctx_reg().
* Add a few new checks to prevent mixing of socket/non-socket pointers.
* Swap order of checks in sock_filter_is_valid_access().
* Prefix register spill macros with "bpf_".
* Add acks from previous round
* Rebase

Changes since v2:
* New patch: "selftests/bpf: Generalize dummy program types".
  This enables adding verifier tests for socket lookup with tail calls.
* Define the semantics of the new helpers more clearly in uAPI header.
* Fix release of caller_net when netns is not specified.
* Use skb->sk to find caller net when skb->dev is unavailable.
* Fix build with !CONFIG_NET.
* Replace ptr_id defensive coding when releasing reference state with an
  internal error (-EFAULT).
* Remove flags argument to sk_release().
* Add several new assembly tests suggested by Daniel.
* Add a few new C tests.
* Fix typo in verifier error message.

Changes since v1:
* Limit netns_id field to 32 bits
* Reuse reg_type_mismatch() in more places
* Reduce the number of passes at convert_ctx_access()
* Replace ptr_id defensive coding when releasing reference state with an
  internal error (-EFAULT)
* Rework 'struct bpf_sock_tuple' to allow passing a packet pointer
* Allow direct packet access from helper
* Fix compile error with CONFIG_IPV6 enabled
* Improve commit messages

Changes since RFC:
* Split up sk_lookup() into sk_lookup_tcp(), sk_lookup_udp().
* Only take references on the socket when necessary.
  * Make sk_release() only free the socket reference in this case.
* Fix some runtime reference leaks:
  * Disallow BPF_LD_[ABS|IND] instructions while holding a reference.
  * Disallow bpf_tail_call() while holding a reference.
* Prevent the same instruction being used for reference and other
  pointer type.
* Simplify locating copies of a reference during helper calls by caching
  the pointer id from the caller.
* Fix kbuild compilation warnings with particular configs.
* Improve code comments describing the new verifier pieces.
* Tested by Nitin
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
  • Loading branch information
Daniel Borkmann committed Oct 3, 2018
2 parents 940656f + a610b66 commit 33d9a7f
Show file tree
Hide file tree
Showing 14 changed files with 2,002 additions and 157 deletions.
64 changes: 64 additions & 0 deletions Documentation/networking/filter.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1125,6 +1125,14 @@ pointer type. The types of pointers describe their base, as follows:
PTR_TO_STACK Frame pointer.
PTR_TO_PACKET skb->data.
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL
Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the
socket release function before the end of the program.
Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate
Expand Down Expand Up @@ -1171,6 +1179,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
that pointer are safe.
The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
represents a reference to the corresponding 'struct sock'. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function.

Direct packet access
--------------------
Expand Down Expand Up @@ -1444,6 +1459,55 @@ Error:
8: (7a) *(u64 *)(r0 +0) = 1
R0 invalid mem access 'imm'

Program that performs a socket lookup then sets the pointer to NULL without
checking it:
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (b7) r0 = 0
9: (95) exit
Unreleased reference id=1, alloc_insn=7

Program that performs a socket lookup but does not NULL-check the returned
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (95) exit
Unreleased reference id=1, alloc_insn=7

Testing
-------

Expand Down
34 changes: 34 additions & 0 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ enum bpf_arg_type {

ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING, /* any (initialized) argument is ok */
ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */
};

/* type of values returned from helper functions */
Expand All @@ -162,6 +163,7 @@ enum bpf_return_type {
RET_VOID, /* function doesn't return anything */
RET_PTR_TO_MAP_VALUE, /* returns a pointer to map elem value */
RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */
RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */
};

/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
Expand Down Expand Up @@ -213,6 +215,8 @@ enum bpf_reg_type {
PTR_TO_PACKET, /* reg points to skb->data */
PTR_TO_PACKET_END, /* skb->data + headlen */
PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */
PTR_TO_SOCKET, /* reg points to struct bpf_sock */
PTR_TO_SOCKET_OR_NULL, /* reg points to struct bpf_sock or NULL */
};

/* The information passed from prog-specific *_is_valid_access
Expand Down Expand Up @@ -343,6 +347,11 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void);

typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src,
unsigned long off, unsigned long len);
typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type,
const struct bpf_insn *src,
struct bpf_insn *dst,
struct bpf_prog *prog,
u32 *target_size);

u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy);
Expand Down Expand Up @@ -836,4 +845,29 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
void bpf_user_rnd_init_once(void);
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);

#if defined(CONFIG_NET)
bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
struct bpf_insn_access_aux *info);
u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size);
#else
static inline bool bpf_sock_is_valid_access(int off, int size,
enum bpf_access_type type,
struct bpf_insn_access_aux *info)
{
return false;
}
static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si,
struct bpf_insn *insn_buf,
struct bpf_prog *prog,
u32 *target_size)
{
return 0;
}
#endif

#endif /* _LINUX_BPF_H */
37 changes: 34 additions & 3 deletions include/linux/bpf_verifier.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ struct bpf_reg_state {
* offset, so they can share range knowledge.
* For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we
* came from, when one is tested for != NULL.
* For PTR_TO_SOCKET this is used to share which pointers retain the
* same reference to the socket, to determine proper reference freeing.
*/
u32 id;
/* For scalar types (SCALAR_VALUE), this represents our knowledge of
Expand Down Expand Up @@ -102,6 +104,17 @@ struct bpf_stack_state {
u8 slot_type[BPF_REG_SIZE];
};

struct bpf_reference_state {
/* Track each reference created with a unique id, even if the same
* instruction creates the reference multiple times (eg, via CALL).
*/
int id;
/* Instruction where the allocation of this reference occurred. This
* is used purely to inform the user of a reference leak.
*/
int insn_idx;
};

/* state of the program:
* type of all registers and stack info
*/
Expand All @@ -119,7 +132,9 @@ struct bpf_func_state {
*/
u32 subprogno;

/* should be second to last. See copy_func_state() */
/* The following fields should be last. See copy_func_state() */
int acquired_refs;
struct bpf_reference_state *refs;
int allocated_stack;
struct bpf_stack_state *stack;
};
Expand All @@ -131,6 +146,17 @@ struct bpf_verifier_state {
u32 curframe;
};

#define bpf_get_spilled_reg(slot, frame) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
(frame->stack[slot].slot_type[0] == STACK_SPILL)) \
? &frame->stack[slot].spilled_ptr : NULL)

/* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
#define bpf_for_each_spilled_reg(iter, frame, reg) \
for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \
iter < frame->allocated_stack / BPF_REG_SIZE; \
iter++, reg = bpf_get_spilled_reg(iter, frame))

/* linked list of verifier states used to prune search */
struct bpf_verifier_state_list {
struct bpf_verifier_state state;
Expand Down Expand Up @@ -204,11 +230,16 @@ __printf(2, 0) void bpf_verifier_vlog(struct bpf_verifier_log *log,
__printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env,
const char *fmt, ...);

static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env)
{
struct bpf_verifier_state *cur = env->cur_state;

return cur->frame[cur->curframe]->regs;
return cur->frame[cur->curframe];
}

static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env)
{
return cur_func(env)->regs;
}

int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
Expand Down
93 changes: 92 additions & 1 deletion include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -2144,6 +2144,77 @@ union bpf_attr {
* request in the skb.
* Return
* 0 on success, or a negative error in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for TCP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags)
* Description
* Look for UDP socket matching *tuple*, optionally in a child
* network namespace *netns*. The return value must be checked,
* and if non-NULL, released via **bpf_sk_release**\ ().
*
* The *ctx* should point to the context of the program, such as
* the skb or socket (depending on the hook in use). This is used
* to determine the base network namespace for the lookup.
*
* *tuple_size* must be one of:
*
* **sizeof**\ (*tuple*\ **->ipv4**)
* Look for an IPv4 socket.
* **sizeof**\ (*tuple*\ **->ipv6**)
* Look for an IPv6 socket.
*
* If the *netns* is zero, then the socket lookup table in the
* netns associated with the *ctx* will be used. For the TC hooks,
* this in the netns of the device in the skb. For socket hooks,
* this in the netns of the socket. If *netns* is non-zero, then
* it specifies the ID of the netns relative to the netns
* associated with the *ctx*.
*
* All values for *flags* are reserved for future usage, and must
* be left at zero.
*
* This helper is available only if the kernel was compiled with
* **CONFIG_NET** configuration option.
* Return
* Pointer to *struct bpf_sock*, or NULL in case of failure.
*
* int bpf_sk_release(struct bpf_sock *sk)
* Description
* Release the reference held by *sock*. *sock* must be a non-NULL
* pointer that was returned from bpf_sk_lookup_xxx\ ().
* Return
* 0 on success, or a negative error in case of failure.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
Expand Down Expand Up @@ -2229,7 +2300,10 @@ union bpf_attr {
FN(get_current_cgroup_id), \
FN(get_local_storage), \
FN(sk_select_reuseport), \
FN(skb_ancestor_cgroup_id),
FN(skb_ancestor_cgroup_id), \
FN(sk_lookup_tcp), \
FN(sk_lookup_udp), \
FN(sk_release),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
Expand Down Expand Up @@ -2399,6 +2473,23 @@ struct bpf_sock {
*/
};

struct bpf_sock_tuple {
union {
struct {
__be32 saddr;
__be32 daddr;
__be16 sport;
__be16 dport;
} ipv4;
struct {
__be32 saddr[4];
__be32 daddr[4];
__be16 sport;
__be16 dport;
} ipv6;
};
};

#define XDP_PACKET_HEADROOM 256

/* User return codes for XDP prog type.
Expand Down
Loading

0 comments on commit 33d9a7f

Please sign in to comment.