Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Browse files Browse the repository at this point in the history
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-28

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).

The main changes are:

1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.

2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.

3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.

4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.

5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.

6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.

7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.

8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.

9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.

10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.

11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.

12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jun 28, 2021
2 parents 1fd07f3 + a78cae2 commit e1289cf
Show file tree
Hide file tree
Showing 56 changed files with 394 additions and 380 deletions.
55 changes: 34 additions & 21 deletions Documentation/RCU/checklist.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,27 +211,40 @@ over a rather long period of time, but improvements are always welcome!
of the system, especially to real-time workloads running on
the rest of the system.

7. As of v4.20, a given kernel implements only one RCU flavor,
which is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
If the updater uses call_rcu() or synchronize_rcu(),
then the corresponding readers may use rcu_read_lock() and
rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
or any pair of primitives that disables and re-enables preemption,
for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
If the updater uses synchronize_srcu() or call_srcu(),
then the corresponding readers must use srcu_read_lock() and
srcu_read_unlock(), and with the same srcu_struct. The rules for
the expedited primitives are the same as for their non-expedited
counterparts. Mixing things up will result in confusion and
broken kernels, and has even resulted in an exploitable security
issue.

One exception to this rule: rcu_read_lock() and rcu_read_unlock()
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
in cases where local bottom halves are already known to be
disabled, for example, in irq or softirq context. Commenting
such cases is a must, of course! And the jury is still out on
whether the increased speed is worth it.
7. As of v4.20, a given kernel implements only one RCU flavor, which
is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
If the updater uses call_rcu() or synchronize_rcu(), then
the corresponding readers may use: (1) rcu_read_lock() and
rcu_read_unlock(), (2) any pair of primitives that disables
and re-enables softirq, for example, rcu_read_lock_bh() and
rcu_read_unlock_bh(), or (3) any pair of primitives that disables
and re-enables preemption, for example, rcu_read_lock_sched() and
rcu_read_unlock_sched(). If the updater uses synchronize_srcu()
or call_srcu(), then the corresponding readers must use
srcu_read_lock() and srcu_read_unlock(), and with the same
srcu_struct. The rules for the expedited RCU grace-period-wait
primitives are the same as for their non-expedited counterparts.

If the updater uses call_rcu_tasks() or synchronize_rcu_tasks(),
then the readers must refrain from executing voluntary
context switches, that is, from blocking. If the updater uses
call_rcu_tasks_trace() or synchronize_rcu_tasks_trace(), then
the corresponding readers must use rcu_read_lock_trace() and
rcu_read_unlock_trace(). If an updater uses call_rcu_tasks_rude()
or synchronize_rcu_tasks_rude(), then the corresponding readers
must use anything that disables interrupts.

Mixing things up will result in confusion and broken kernels, and
has even resulted in an exploitable security issue. Therefore,
when using non-obvious pairs of primitives, commenting is
of course a must. One example of non-obvious pairing is
the XDP feature in networking, which calls BPF programs from
network-driver NAPI (softirq) context. BPF relies heavily on RCU
protection for its data structures, but because the BPF program
invocation happens entirely within a single local_bh_disable()
section in a NAPI poll cycle, this usage is safe. The reason
that this usage is safe is that readers can use anything that
disables BH when updaters use call_rcu() or synchronize_rcu().

8. Although synchronize_rcu() is slower than is call_rcu(), it
usually results in simpler code. So, unless update performance is
Expand Down
13 changes: 13 additions & 0 deletions Documentation/bpf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,19 @@ BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.

libbpf
======

Libbpf is a userspace library for loading and interacting with bpf programs.

.. toctree::
:maxdepth: 1

libbpf/libbpf
libbpf/libbpf_api
libbpf/libbpf_build
libbpf/libbpf_naming_convention

BPF Type Format (BTF)
=====================

Expand Down
14 changes: 14 additions & 0 deletions Documentation/bpf/libbpf/libbpf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
libbpf
======

This is documentation for libbpf, a userspace library for loading and
interacting with bpf programs.

All general BPF questions, including kernel functionality, libbpf APIs and
their application, should be sent to bpf@vger.kernel.org mailing list.
You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the
mailing list search its `archive <https://lore.kernel.org/bpf/>`_.
Please search the archive before asking new questions. It very well might
be that this was already addressed or answered before.
27 changes: 27 additions & 0 deletions Documentation/bpf/libbpf/libbpf_api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
API
===

This documentation is autogenerated from header files in libbpf, tools/lib/bpf

.. kernel-doc:: tools/lib/bpf/libbpf.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf.h
:internal:

.. kernel-doc:: tools/lib/bpf/btf.h
:internal:

.. kernel-doc:: tools/lib/bpf/xsk.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf_tracing.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf_core_read.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf_endian.h
:internal:
37 changes: 37 additions & 0 deletions Documentation/bpf/libbpf/libbpf_build.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
Building libbpf
===============

libelf and zlib are internal dependencies of libbpf and thus are required to link
against and must be installed on the system for applications to work.
pkg-config is used by default to find libelf, and the program called
can be overridden with PKG_CONFIG.

If using pkg-config at build time is not desired, it can be disabled by
setting NO_PKG_CONFIG=1 when calling make.

To build both static libbpf.a and shared libbpf.so:

.. code-block:: bash
$ cd src
$ make
To build only static libbpf.a library in directory build/ and install them
together with libbpf headers in a staging directory root/:

.. code-block:: bash
$ cd src
$ mkdir build root
$ BUILD_STATIC_ONLY=y OBJDIR=build DESTDIR=root make install
To build both static libbpf.a and shared libbpf.so against a custom libelf
dependency installed in /build/root/ and install them together with libbpf
headers in a build directory /build/root/:

.. code-block:: bash
$ cd src
$ PKG_CONFIG_PATH=/build/root/lib64/pkgconfig DESTDIR=/build/root make
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
libbpf API naming convention
============================
API naming convention
=====================

libbpf API provides access to a few logically separated groups of
functions and types. Every group has its own naming convention
Expand All @@ -10,14 +10,14 @@ new function or type is added to keep libbpf API clean and consistent.

All types and functions provided by libbpf API should have one of the
following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``,
``perf_buffer_``.
``btf_dump_``, ``ring_buffer_``, ``perf_buffer_``.

System call wrappers
--------------------

System call wrappers are simple wrappers for commands supported by
sys_bpf system call. These wrappers should go to ``bpf.h`` header file
and map one-on-one to corresponding commands.
and map one to one to corresponding commands.

For example ``bpf_map_lookup_elem`` wraps ``BPF_MAP_LOOKUP_ELEM``
command of sys_bpf, ``bpf_prog_attach`` wraps ``BPF_PROG_ATTACH``, etc.
Expand Down Expand Up @@ -49,10 +49,6 @@ object, ``bpf_object``, double underscore and ``open`` that defines the
purpose of the function to open ELF file and create ``bpf_object`` from
it.

Another example: ``bpf_program__load`` is named for corresponding
object, ``bpf_program``, that is separated from other part of the name
by double underscore.

All objects and corresponding functions other than BTF related should go
to ``libbpf.h``. BTF types and functions should go to ``btf.h``.

Expand All @@ -72,11 +68,7 @@ of both low-level ring access functions and high-level configuration
functions. These can be mixed and matched. Note that these functions
are not reentrant for performance reasons.

Please take a look at Documentation/networking/af_xdp.rst in the Linux
kernel source tree on how to use XDP sockets and for some common
mistakes in case you do not get any traffic up to user space.

libbpf ABI
ABI
==========

libbpf can be both linked statically or used as DSO. To avoid possible
Expand Down Expand Up @@ -116,7 +108,8 @@ This bump in ABI version is at most once per kernel development cycle.

For example, if current state of ``libbpf.map`` is:

.. code-block::
.. code-block:: c
LIBBPF_0.0.1 {
global:
bpf_func_a;
Expand All @@ -128,7 +121,8 @@ For example, if current state of ``libbpf.map`` is:
, and a new symbol ``bpf_func_c`` is being introduced, then
``libbpf.map`` should be changed like this:

.. code-block::
.. code-block:: c
LIBBPF_0.0.1 {
global:
bpf_func_a;
Expand All @@ -148,7 +142,7 @@ Format of version script and ways to handle ABI changes, including
incompatible ones, described in details in [1].

Stand-alone build
=================
-------------------

Under https://github.com/libbpf/libbpf there is a (semi-)automated
mirror of the mainline's version of libbpf for a stand-alone build.
Expand All @@ -157,12 +151,12 @@ However, all changes to libbpf's code base must be upstreamed through
the mainline kernel tree.

License
=======
-------------------

libbpf is dual-licensed under LGPL 2.1 and BSD 2-Clause.

Links
=====
-------------------

[1] https://www.akkadia.org/drepper/dsohowto.pdf
(Chapter 3. Maintaining APIs and ABIs).
32 changes: 16 additions & 16 deletions Documentation/networking/af_xdp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -290,19 +290,19 @@ round-robin example of distributing packets is shown below:
#define MAX_SOCKS 16
struct {
__uint(type, BPF_MAP_TYPE_XSKMAP);
__uint(max_entries, MAX_SOCKS);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
__uint(type, BPF_MAP_TYPE_XSKMAP);
__uint(max_entries, MAX_SOCKS);
__uint(key_size, sizeof(int));
__uint(value_size, sizeof(int));
} xsks_map SEC(".maps");
static unsigned int rr;
SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
{
rr = (rr + 1) & (MAX_SOCKS - 1);
rr = (rr + 1) & (MAX_SOCKS - 1);
return bpf_redirect_map(&xsks_map, rr, XDP_DROP);
return bpf_redirect_map(&xsks_map, rr, XDP_DROP);
}
Note, that since there is only a single set of FILL and COMPLETION
Expand Down Expand Up @@ -379,7 +379,7 @@ would look like this for the TX path:
.. code-block:: c
if (xsk_ring_prod__needs_wakeup(&my_tx_ring))
sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);
sendto(xsk_socket__fd(xsk_handle), NULL, 0, MSG_DONTWAIT, NULL, 0);
I.e., only use the syscall if the flag is set.

Expand Down Expand Up @@ -442,9 +442,9 @@ purposes. The supported statistics are shown below:
.. code-block:: c
struct xdp_statistics {
__u64 rx_dropped; /* Dropped for reasons other than invalid desc */
__u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
__u64 rx_dropped; /* Dropped for reasons other than invalid desc */
__u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
};
XDP_OPTIONS getsockopt
Expand Down Expand Up @@ -483,15 +483,15 @@ like this:
.. code-block:: c
// struct xdp_rxtx_ring {
// __u32 *producer;
// __u32 *consumer;
// struct xdp_desc *desc;
// __u32 *producer;
// __u32 *consumer;
// struct xdp_desc *desc;
// };
// struct xdp_umem_ring {
// __u32 *producer;
// __u32 *consumer;
// __u64 *desc;
// __u32 *producer;
// __u32 *consumer;
// __u64 *desc;
// };
// typedef struct xdp_rxtx_ring RING;
Expand Down
Loading

0 comments on commit e1289cf

Please sign in to comment.