Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-11-30

We've added 30 non-merge commits during the last 7 day(s) which contain
a total of 58 files changed, 1598 insertions(+), 154 deletions(-).

The main changes are:

1) Add initial TX metadata implementation for AF_XDP with support in mlx5
   and stmmac drivers. Two types of offloads are supported right now, that
   is, TX timestamp and TX checksum offload, from Stanislav Fomichev with
   stmmac implementation from Song Yoong Siang.

2) Change BPF verifier logic to validate global subprograms lazily instead
   of unconditionally before the main program, so they can be guarded using
   BPF CO-RE techniques, from Andrii Nakryiko.

3) Add BPF link_info support for uprobe multi link along with bpftool
   integration for the latter, from Jiri Olsa.

4) Use pkg-config in BPF selftests to determine ld flags which is
   in particular needed for linking statically, from Akihiko Odaki.

5) Fix a few BPF selftest failures to adapt to the upcoming LLVM18,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (30 commits)
  bpf/tests: Remove duplicate JSGT tests
  selftests/bpf: Add TX side to xdp_hw_metadata
  selftests/bpf: Convert xdp_hw_metadata to XDP_USE_NEED_WAKEUP
  selftests/bpf: Add TX side to xdp_metadata
  selftests/bpf: Add csum helpers
  selftests/xsk: Support tx_metadata_len
  xsk: Add option to calculate TX checksum in SW
  xsk: Validate xsk_tx_metadata flags
  xsk: Document tx_metadata_len layout
  net: stmmac: Add Tx HWTS support to XDP ZC
  net/mlx5e: Implement AF_XDP TX timestamp and checksum offload
  tools: ynl: Print xsk-features from the sample
  xsk: Add TX timestamp and TX checksum offload support
  xsk: Support tx_metadata_len
  selftests/bpf: Use pkg-config for libelf
  selftests/bpf: Override PKG_CONFIG for static builds
  selftests/bpf: Choose pkg-config for the target
  bpftool: Add support to display uprobe_multi links
  selftests/bpf: Add link_info test for uprobe_multi link
  selftests/bpf: Use bpf_link__destroy in fill_link_info tests
  ...
====================

Conflicts:

Documentation/netlink/specs/netdev.yaml:
  839ff60 ("net: page_pool: add nlspec for basic access to page pools")
  48eb03d ("xsk: Add TX timestamp and TX checksum offload support")
https://lore.kernel.org/all/20231201094705.1ee3cab8@canb.auug.org.au/

While at it also regen, tree is dirty after:
  48eb03d ("xsk: Add TX timestamp and TX checksum offload support")
looks like code wasn't re-rendered after "render-max" was removed.

Link: https://lore.kernel.org/r/20231130145708.32573-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Jakub Kicinski committed Dec 1, 2023
2 parents 975f2d7 + f690ff9 commit 753c860
Show file tree
Hide file tree
Showing 58 changed files with 1,590 additions and 158 deletions.
19 changes: 18 additions & 1 deletion Documentation/netlink/specs/netdev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ definitions:
-
type: flags
name: xdp-rx-metadata
render-max: true
entries:
-
name: timestamp
Expand All @@ -55,6 +54,18 @@ definitions:
name: hash
doc:
Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash().
-
type: flags
name: xsk-flags
entries:
-
name: tx-timestamp
doc:
HW timestamping egress packets is supported by the driver.
-
name: tx-checksum
doc:
L3 checksum HW offload is supported by the driver.

attribute-sets:
-
Expand Down Expand Up @@ -86,6 +97,11 @@ attribute-sets:
See Documentation/networking/xdp-rx-metadata.rst for more details.
type: u64
enum: xdp-rx-metadata
-
name: xsk-features
doc: Bitmask of enabled AF_XDP features.
type: u64
enum: xsk-flags
-
name: page-pool
attributes:
Expand Down Expand Up @@ -209,6 +225,7 @@ operations:
- xdp-features
- xdp-zc-max-segs
- xdp-rx-metadata-features
- xsk-features
dump:
reply: *dev-all
-
Expand Down
1 change: 1 addition & 0 deletions Documentation/networking/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ Contents:
xfrm_sync
xfrm_sysctl
xdp-rx-metadata
xsk-tx-metadata

.. only:: subproject and html

Expand Down
2 changes: 2 additions & 0 deletions Documentation/networking/xdp-rx-metadata.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. SPDX-License-Identifier: GPL-2.0
===============
XDP RX Metadata
===============
Expand Down
79 changes: 79 additions & 0 deletions Documentation/networking/xsk-tx-metadata.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
==================
AF_XDP TX Metadata
==================

This document describes how to enable offloads when transmitting packets
via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar
metadata on the receive side.

General Design
==============

The headroom for the metadata is reserved via ``tx_metadata_len`` in
``struct xdp_umem_reg``. The metadata length is therefore the same for
every socket that shares the same umem. The metadata layout is a fixed UAPI,
refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``.
Thus, generally, the ``tx_metadata_len`` field above should contain
``sizeof(union xsk_tx_metadata)``.

The headroom and the metadata itself should be located right before
``xdp_desc->addr`` in the umem frame. Within a frame, the metadata
layout is as follows::

tx_metadata_len
/ \
+-----------------+---------+----------------------------+
| xsk_tx_metadata | padding | payload |
+-----------------+---------+----------------------------+
^
|
xdp_desc->addr

An AF_XDP application can request headrooms larger than ``sizeof(struct
xsk_tx_metadata)``. The kernel will ignore the padding (and will still
use ``xdp_desc->addr - tx_metadata_len`` to locate
the ``xsk_tx_metadata``). For the frames that shouldn't carry
any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option),
the metadata area is ignored by the kernel as well.

The flags field enables the particular offload:

- ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission
timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``.
- ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4
checksum. ``csum_start`` specifies byte offset of where the checksumming
should start and ``csum_offset`` specifies byte offset where the
device should store the computed checksum.

Besides the flags above, in order to trigger the offloads, the first
packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA``
bit in the ``options`` field. Also note that in a multi-buffer packet
only the first chunk should carry the metadata.

Software TX Checksum
====================

For development and testing purposes its possible to pass
``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call.
In this case, when running in ``XDK_COPY`` mode, the TX checksum
is calculated on the CPU. Do not enable this option in production because
it will negatively affect performance.

Querying Device Capabilities
============================

Every devices exports its offloads capabilities via netlink netdev family.
Refer to ``xsk-flags`` features bitmask in
``Documentation/netlink/specs/netdev.yaml``.

- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP``
- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM``

See ``tools/net/ynl/samples/netdev.c`` on how to query this information.

Example
=======

See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example
program that handles TX metadata. Also see https://github.com/fomichev/xskgen
for a more bare-bones example.
4 changes: 3 additions & 1 deletion drivers/net/ethernet/mellanox/mlx5/core/en.h
Original file line number Diff line number Diff line change
Expand Up @@ -484,10 +484,12 @@ struct mlx5e_xdp_info_fifo {

struct mlx5e_xdpsq;
struct mlx5e_xmit_data;
struct xsk_tx_metadata;
typedef int (*mlx5e_fp_xmit_xdp_frame_check)(struct mlx5e_xdpsq *);
typedef bool (*mlx5e_fp_xmit_xdp_frame)(struct mlx5e_xdpsq *,
struct mlx5e_xmit_data *,
int);
int,
struct xsk_tx_metadata *);

struct mlx5e_xdpsq {
/* data path */
Expand Down
72 changes: 61 additions & 11 deletions drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
xdptxd->dma_addr = dma_addr;

if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe,
mlx5e_xmit_xdp_frame, sq, xdptxd, 0)))
mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL)))
return false;

/* xmit_mode == MLX5E_XDP_XMIT_MODE_FRAME */
Expand Down Expand Up @@ -145,7 +145,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
xdptxd->dma_addr = dma_addr;

if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe,
mlx5e_xmit_xdp_frame, sq, xdptxd, 0)))
mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL)))
return false;

/* xmit_mode == MLX5E_XDP_XMIT_MODE_PAGE */
Expand Down Expand Up @@ -261,6 +261,37 @@ const struct xdp_metadata_ops mlx5e_xdp_metadata_ops = {
.xmo_rx_hash = mlx5e_xdp_rx_hash,
};

struct mlx5e_xsk_tx_complete {
struct mlx5_cqe64 *cqe;
struct mlx5e_cq *cq;
};

static u64 mlx5e_xsk_fill_timestamp(void *_priv)
{
struct mlx5e_xsk_tx_complete *priv = _priv;
u64 ts;

ts = get_cqe_ts(priv->cqe);

if (mlx5_is_real_time_rq(priv->cq->mdev) || mlx5_is_real_time_sq(priv->cq->mdev))
return mlx5_real_time_cyc2time(&priv->cq->mdev->clock, ts);

return mlx5_timecounter_cyc2time(&priv->cq->mdev->clock, ts);
}

static void mlx5e_xsk_request_checksum(u16 csum_start, u16 csum_offset, void *priv)
{
struct mlx5_wqe_eth_seg *eseg = priv;

/* HW/FW is doing parsing, so offsets are largely ignored. */
eseg->cs_flags |= MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
}

const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops = {
.tmo_fill_timestamp = mlx5e_xsk_fill_timestamp,
.tmo_request_checksum = mlx5e_xsk_request_checksum,
};

/* returns true if packet was consumed by xdp */
bool mlx5e_xdp_handle(struct mlx5e_rq *rq,
struct bpf_prog *prog, struct mlx5e_xdp_buff *mxbuf)
Expand Down Expand Up @@ -398,11 +429,11 @@ INDIRECT_CALLABLE_SCOPE int mlx5e_xmit_xdp_frame_check_mpwqe(struct mlx5e_xdpsq

INDIRECT_CALLABLE_SCOPE bool
mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
int check_result);
int check_result, struct xsk_tx_metadata *meta);

INDIRECT_CALLABLE_SCOPE bool
mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
int check_result)
int check_result, struct xsk_tx_metadata *meta)
{
struct mlx5e_tx_mpwqe *session = &sq->mpwqe;
struct mlx5e_xdpsq_stats *stats = sq->stats;
Expand All @@ -420,7 +451,7 @@ mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptx
*/
if (unlikely(sq->mpwqe.wqe))
mlx5e_xdp_mpwqe_complete(sq);
return mlx5e_xmit_xdp_frame(sq, xdptxd, 0);
return mlx5e_xmit_xdp_frame(sq, xdptxd, 0, meta);
}
if (!xdptxd->len) {
skb_frag_t *frag = &xdptxdf->sinfo->frags[0];
Expand Down Expand Up @@ -450,6 +481,7 @@ mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptx
* and it's safe to complete it at any time.
*/
mlx5e_xdp_mpwqe_session_start(sq);
xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, &session->wqe->eth);
}

mlx5e_xdp_mpwqe_add_dseg(sq, p, stats);
Expand Down Expand Up @@ -480,7 +512,7 @@ INDIRECT_CALLABLE_SCOPE int mlx5e_xmit_xdp_frame_check(struct mlx5e_xdpsq *sq)

INDIRECT_CALLABLE_SCOPE bool
mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
int check_result)
int check_result, struct xsk_tx_metadata *meta)
{
struct mlx5e_xmit_data_frags *xdptxdf =
container_of(xdptxd, struct mlx5e_xmit_data_frags, xd);
Expand Down Expand Up @@ -599,6 +631,8 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
sq->pc++;
}

xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, eseg);

sq->doorbell_cseg = cseg;

stats->xmit++;
Expand All @@ -608,7 +642,9 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd,
static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
struct mlx5e_xdp_wqe_info *wi,
u32 *xsk_frames,
struct xdp_frame_bulk *bq)
struct xdp_frame_bulk *bq,
struct mlx5e_cq *cq,
struct mlx5_cqe64 *cqe)
{
struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo;
u16 i;
Expand Down Expand Up @@ -668,10 +704,24 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,

break;
}
case MLX5E_XDP_XMIT_MODE_XSK:
case MLX5E_XDP_XMIT_MODE_XSK: {
/* AF_XDP send */
struct xsk_tx_metadata_compl *compl = NULL;
struct mlx5e_xsk_tx_complete priv = {
.cqe = cqe,
.cq = cq,
};

if (xp_tx_metadata_enabled(sq->xsk_pool)) {
xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
compl = &xdpi.xsk_meta;

xsk_tx_metadata_complete(compl, &mlx5e_xsk_tx_metadata_ops, &priv);
}

(*xsk_frames)++;
break;
}
default:
WARN_ON_ONCE(true);
}
Expand Down Expand Up @@ -720,7 +770,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)

sqcc += wi->num_wqebbs;

mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq);
mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, cq, cqe);
} while (!last_wqe);

if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ)) {
Expand Down Expand Up @@ -767,7 +817,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq)

sq->cc += wi->num_wqebbs;

mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq);
mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, NULL, NULL);
}

xdp_flush_frame_bulk(&bq);
Expand Down Expand Up @@ -840,7 +890,7 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
}

ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe,
mlx5e_xmit_xdp_frame, sq, xdptxd, 0);
mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL);
if (unlikely(!ret)) {
int j;

Expand Down
11 changes: 8 additions & 3 deletions drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
#define __MLX5_EN_XDP_H__

#include <linux/indirect_call_wrapper.h>
#include <net/xdp_sock.h>

#include "en.h"
#include "en/txrx.h"
Expand Down Expand Up @@ -82,7 +83,7 @@ enum mlx5e_xdp_xmit_mode {
* num, page_1, page_2, ... , page_num.
*
* MLX5E_XDP_XMIT_MODE_XSK:
* none.
* frame.xsk_meta.
*/
#define MLX5E_XDP_FIFO_ENTRIES2DS_MAX_RATIO 4

Expand All @@ -97,6 +98,7 @@ union mlx5e_xdp_info {
u8 num;
struct page *page;
} page;
struct xsk_tx_metadata_compl xsk_meta;
};

struct mlx5e_xsk_param;
Expand All @@ -112,13 +114,16 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
u32 flags);

extern const struct xdp_metadata_ops mlx5e_xdp_metadata_ops;
extern const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops;

INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq,
struct mlx5e_xmit_data *xdptxd,
int check_result));
int check_result,
struct xsk_tx_metadata *meta));
INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq,
struct mlx5e_xmit_data *xdptxd,
int check_result));
int check_result,
struct xsk_tx_metadata *meta));
INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check_mpwqe(struct mlx5e_xdpsq *sq));
INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check(struct mlx5e_xdpsq *sq));

Expand Down
Loading

0 comments on commit 753c860

Please sign in to comment.