-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bpf: sk_lookup: Add user documentation
Describe the purpose of BPF sk_lookup program, how it can be attached, when it gets invoked, and what information gets passed to it. Point the reader to examples and further documentation. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821100226.403844-1-jakub@cloudflare.com
- Loading branch information
Jakub Sitnicki
authored and
Alexei Starovoitov
committed
Aug 24, 2020
1 parent
4d0d167
commit 07ff4f0
Showing
2 changed files
with
99 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,7 @@ Program types | |
prog_cgroup_sysctl | ||
prog_flow_dissector | ||
bpf_lsm | ||
prog_sk_lookup | ||
|
||
|
||
Map types | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) | ||
===================== | ||
BPF sk_lookup program | ||
===================== | ||
|
||
BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability | ||
into the socket lookup performed by the transport layer when a packet is to be | ||
delivered locally. | ||
|
||
When invoked BPF sk_lookup program can select a socket that will receive the | ||
incoming packet by calling the ``bpf_sk_assign()`` BPF helper function. | ||
|
||
Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP. | ||
|
||
Motivation | ||
========== | ||
|
||
BPF sk_lookup program type was introduced to address setup scenarios where | ||
binding sockets to an address with ``bind()`` socket call is impractical, such | ||
as: | ||
|
||
1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when | ||
binding to a wildcard address ``INADRR_ANY`` is not possible due to a port | ||
conflict, | ||
2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use | ||
case. | ||
|
||
Such setups would require creating and ``bind()``'ing one socket to each of the | ||
IP address/port in the range, leading to resource consumption and potential | ||
latency spikes during socket lookup. | ||
|
||
Attachment | ||
========== | ||
|
||
BPF sk_lookup program can be attached to a network namespace with | ||
``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a | ||
netns FD as attachment ``target_fd``. | ||
|
||
Multiple programs can be attached to one network namespace. Programs will be | ||
invoked in the same order as they were attached. | ||
|
||
Hooks | ||
===== | ||
|
||
The attached BPF sk_lookup programs run whenever the transport layer needs to | ||
find a listening (TCP) or an unconnected (UDP) socket for an incoming packet. | ||
|
||
Incoming traffic to established (TCP) and connected (UDP) sockets is delivered | ||
as usual without triggering the BPF sk_lookup hook. | ||
|
||
The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP`` | ||
verdict code. As for other BPF program types that are network filters, | ||
``SK_PASS`` signifies that the socket lookup should continue on to regular | ||
hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the | ||
packet. | ||
|
||
A BPF sk_lookup program can also select a socket to receive the packet by | ||
calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket | ||
in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a | ||
``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the | ||
selection. Selecting a socket only takes effect if the program has terminated | ||
with ``SK_PASS`` code. | ||
|
||
When multiple programs are attached, the end result is determined from return | ||
codes of all the programs according to the following rules: | ||
|
||
1. If any program returned ``SK_PASS`` and selected a valid socket, the socket | ||
is used as the result of the socket lookup. | ||
2. If more than one program returned ``SK_PASS`` and selected a socket, the last | ||
selection takes effect. | ||
3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and | ||
selected a socket, socket lookup fails. | ||
4. If all programs returned ``SK_PASS`` and none of them selected a socket, | ||
socket lookup continues on. | ||
|
||
API | ||
=== | ||
|
||
In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program | ||
receives information about the packet that triggered the socket lookup. Namely: | ||
|
||
* IP version (``AF_INET`` or ``AF_INET6``), | ||
* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``), | ||
* source and destination IP address, | ||
* source and destination L4 port, | ||
* the socket that has been selected with ``bpf_sk_assign()``. | ||
|
||
Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API | ||
header, and `bpf-helpers(7) | ||
<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section | ||
for ``bpf_sk_assign()`` for details. | ||
|
||
Example | ||
======= | ||
|
||
See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference | ||
implementation. |