Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Browse files Browse the repository at this point in the history
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-10-08

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.

2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.

6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf

7) various AF_XDP fixes from Björn and Magnus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Oct 9, 2018
2 parents 9000a45 + df3f94a commit 071a234
Show file tree
Hide file tree
Showing 66 changed files with 4,184 additions and 707 deletions.
4 changes: 2 additions & 2 deletions Documentation/networking/af_xdp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,8 @@ log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
and 3000 refers to the same chunk.


UMEM Completetion Ring
~~~~~~~~~~~~~~~~~~~~~~
UMEM Completion Ring
~~~~~~~~~~~~~~~~~~~~

The Completion Ring is used transfer ownership of UMEM frames from
kernel-space to user-space. Just like the Fill ring, UMEM indicies are
Expand Down
94 changes: 80 additions & 14 deletions Documentation/networking/filter.txt
Original file line number Diff line number Diff line change
Expand Up @@ -203,11 +203,11 @@ opcodes as defined in linux/filter.h stand for:

Instruction Addressing mode Description

ld 1, 2, 3, 4, 10 Load word into A
ld 1, 2, 3, 4, 12 Load word into A
ldi 4 Load word into A
ldh 1, 2 Load half-word into A
ldb 1, 2 Load byte into A
ldx 3, 4, 5, 10 Load word into X
ldx 3, 4, 5, 12 Load word into X
ldxi 4 Load word into X
ldxb 5 Load byte into X

Expand All @@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for:

jmp 6 Jump to label
ja 6 Jump to label
jeq 7, 8 Jump on A == k
jneq 8 Jump on A != k
jne 8 Jump on A != k
jlt 8 Jump on A < k
jle 8 Jump on A <= k
jgt 7, 8 Jump on A > k
jge 7, 8 Jump on A >= k
jset 7, 8 Jump on A & k
jeq 7, 8, 9, 10 Jump on A == <x>
jneq 9, 10 Jump on A != <x>
jne 9, 10 Jump on A != <x>
jlt 9, 10 Jump on A < <x>
jle 9, 10 Jump on A <= <x>
jgt 7, 8, 9, 10 Jump on A > <x>
jge 7, 8, 9, 10 Jump on A >= <x>
jset 7, 8, 9, 10 Jump on A & <x>

add 0, 4 A + <x>
sub 0, 4 A - <x>
Expand All @@ -240,7 +240,7 @@ opcodes as defined in linux/filter.h stand for:
tax Copy A into X
txa Copy X into A

ret 4, 9 Return
ret 4, 11 Return

The next table shows addressing formats from the 2nd column:

Expand All @@ -254,9 +254,11 @@ The next table shows addressing formats from the 2nd column:
5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet
6 L Jump label L
7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf
8 #k,Lt Jump to Lt if predicate is true
9 a/%a Accumulator A
10 extension BPF extension
8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf
9 #k,Lt Jump to Lt if predicate is true
10 x/%x,Lt Jump to Lt if predicate is true
11 a/%a Accumulator A
12 extension BPF extension

The Linux kernel also has a couple of BPF extensions that are used along
with the class of load instructions by "overloading" the k argument with
Expand Down Expand Up @@ -1125,6 +1127,14 @@ pointer type. The types of pointers describe their base, as follows:
PTR_TO_STACK Frame pointer.
PTR_TO_PACKET skb->data.
PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden.
PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted.
PTR_TO_SOCKET_OR_NULL
Either a pointer to a socket, or NULL; socket lookup
returns this type, which becomes a PTR_TO_SOCKET when
checked != NULL. PTR_TO_SOCKET is reference-counted,
so programs must release the reference through the
socket release function before the end of the program.
Arithmetic on these pointers is forbidden.
However, a pointer may be offset from this base (as a result of pointer
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
offset'. The former is used when an exactly-known value (e.g. an immediate
Expand Down Expand Up @@ -1171,6 +1181,13 @@ over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
that pointer are safe.
The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
to all copies of the pointer returned from a socket lookup. This has similar
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
represents a reference to the corresponding 'struct sock'. To ensure that the
reference is not leaked, it is imperative to NULL-check the reference and in
the non-NULL case, and pass the valid reference to the socket release function.

Direct packet access
--------------------
Expand Down Expand Up @@ -1444,6 +1461,55 @@ Error:
8: (7a) *(u64 *)(r0 +0) = 1
R0 invalid mem access 'imm'

Program that performs a socket lookup then sets the pointer to NULL without
checking it:
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (b7) r0 = 0
9: (95) exit
Unreleased reference id=1, alloc_insn=7

Program that performs a socket lookup but does not NULL-check the returned
value:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_MOV64_IMM(BPF_REG_3, 4),
BPF_MOV64_IMM(BPF_REG_4, 0),
BPF_MOV64_IMM(BPF_REG_5, 0),
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
BPF_EXIT_INSN(),
Error:
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (bf) r2 = r10
3: (07) r2 += -8
4: (b7) r3 = 4
5: (b7) r4 = 0
6: (b7) r5 = 0
7: (85) call bpf_sk_lookup_tcp#65
8: (95) exit
Unreleased reference id=1, alloc_insn=7

Testing
-------

Expand Down
71 changes: 63 additions & 8 deletions drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
Original file line number Diff line number Diff line change
Expand Up @@ -89,15 +89,32 @@ nfp_bpf_cmsg_alloc(struct nfp_app_bpf *bpf, unsigned int size)
return skb;
}

static unsigned int
nfp_bpf_cmsg_map_req_size(struct nfp_app_bpf *bpf, unsigned int n)
{
unsigned int size;

size = sizeof(struct cmsg_req_map_op);
size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n;

return size;
}

static struct sk_buff *
nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n)
{
return nfp_bpf_cmsg_alloc(bpf, nfp_bpf_cmsg_map_req_size(bpf, n));
}

static unsigned int
nfp_bpf_cmsg_map_reply_size(struct nfp_app_bpf *bpf, unsigned int n)
{
unsigned int size;

size = sizeof(struct cmsg_req_map_op);
size += sizeof(struct cmsg_key_value_pair) * n;
size = sizeof(struct cmsg_reply_map_op);
size += (bpf->cmsg_key_sz + bpf->cmsg_val_sz) * n;

return nfp_bpf_cmsg_alloc(bpf, size);
return size;
}

static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb)
Expand Down Expand Up @@ -338,6 +355,34 @@ void nfp_bpf_ctrl_free_map(struct nfp_app_bpf *bpf, struct nfp_bpf_map *nfp_map)
dev_consume_skb_any(skb);
}

static void *
nfp_bpf_ctrl_req_key(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req,
unsigned int n)
{
return &req->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n];
}

static void *
nfp_bpf_ctrl_req_val(struct nfp_app_bpf *bpf, struct cmsg_req_map_op *req,
unsigned int n)
{
return &req->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
}

static void *
nfp_bpf_ctrl_reply_key(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
unsigned int n)
{
return &reply->data[bpf->cmsg_key_sz * n + bpf->cmsg_val_sz * n];
}

static void *
nfp_bpf_ctrl_reply_val(struct nfp_app_bpf *bpf, struct cmsg_reply_map_op *reply,
unsigned int n)
{
return &reply->data[bpf->cmsg_key_sz * (n + 1) + bpf->cmsg_val_sz * n];
}

static int
nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,
enum nfp_bpf_cmsg_type op,
Expand Down Expand Up @@ -366,12 +411,13 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,

/* Copy inputs */
if (key)
memcpy(&req->elem[0].key, key, map->key_size);
memcpy(nfp_bpf_ctrl_req_key(bpf, req, 0), key, map->key_size);
if (value)
memcpy(&req->elem[0].value, value, map->value_size);
memcpy(nfp_bpf_ctrl_req_val(bpf, req, 0), value,
map->value_size);

skb = nfp_bpf_cmsg_communicate(bpf, skb, op,
sizeof(*reply) + sizeof(*reply->elem));
nfp_bpf_cmsg_map_reply_size(bpf, 1));
if (IS_ERR(skb))
return PTR_ERR(skb);

Expand All @@ -382,9 +428,11 @@ nfp_bpf_ctrl_entry_op(struct bpf_offloaded_map *offmap,

/* Copy outputs */
if (out_key)
memcpy(out_key, &reply->elem[0].key, map->key_size);
memcpy(out_key, nfp_bpf_ctrl_reply_key(bpf, reply, 0),
map->key_size);
if (out_value)
memcpy(out_value, &reply->elem[0].value, map->value_size);
memcpy(out_value, nfp_bpf_ctrl_reply_val(bpf, reply, 0),
map->value_size);

dev_consume_skb_any(skb);

Expand Down Expand Up @@ -428,6 +476,13 @@ int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
key, NULL, 0, next_key, NULL);
}

unsigned int nfp_bpf_ctrl_cmsg_mtu(struct nfp_app_bpf *bpf)
{
return max3((unsigned int)NFP_NET_DEFAULT_MTU,
nfp_bpf_cmsg_map_req_size(bpf, 1),
nfp_bpf_cmsg_map_reply_size(bpf, 1));
}

void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
{
struct nfp_app_bpf *bpf = app->priv;
Expand Down
11 changes: 4 additions & 7 deletions drivers/net/ethernet/netronome/nfp/bpf/fw.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ enum bpf_cap_tlv_type {
NFP_BPF_CAP_TYPE_RANDOM = 4,
NFP_BPF_CAP_TYPE_QUEUE_SELECT = 5,
NFP_BPF_CAP_TYPE_ADJUST_TAIL = 6,
NFP_BPF_CAP_TYPE_ABI_VERSION = 7,
};

struct nfp_bpf_cap_tlv_func {
Expand Down Expand Up @@ -98,6 +99,7 @@ enum nfp_bpf_cmsg_type {
#define CMSG_TYPE_MAP_REPLY_BIT 7
#define __CMSG_REPLY(req) (BIT(CMSG_TYPE_MAP_REPLY_BIT) | (req))

/* BPF ABIv2 fixed-length control message fields */
#define CMSG_MAP_KEY_LW 16
#define CMSG_MAP_VALUE_LW 16

Expand Down Expand Up @@ -147,24 +149,19 @@ struct cmsg_reply_map_free_tbl {
__be32 count;
};

struct cmsg_key_value_pair {
__be32 key[CMSG_MAP_KEY_LW];
__be32 value[CMSG_MAP_VALUE_LW];
};

struct cmsg_req_map_op {
struct cmsg_hdr hdr;
__be32 tid;
__be32 count;
__be32 flags;
struct cmsg_key_value_pair elem[0];
u8 data[0];
};

struct cmsg_reply_map_op {
struct cmsg_reply_map_simple reply_hdr;
__be32 count;
__be32 resv;
struct cmsg_key_value_pair elem[0];
u8 data[0];
};

struct cmsg_bpf_event {
Expand Down
Loading

0 comments on commit 071a234

Please sign in to comment.