Skip to content

Commit

Permalink
Merge branch 'xdp_rxq_info'
Browse files Browse the repository at this point in the history
Jesper Dangaard Brouer says:

====================
V4:
* Added reviewers/acks to patches
* Fix patch desc in i40e that got out-of-sync with code
* Add SPDX license headers for the two new files added in patch 14

V3:
* Fixed bug in virtio_net driver
* Removed export of xdp_rxq_info_init()

V2:
* Changed API exposed to drivers
  - Removed invocation of "init" in drivers, and only call "reg"
    (Suggested by Saeed)
  - Allow "reg" to fail and handle this in drivers
    (Suggested by David Ahern)
* Removed the SINKQ qtype, instead allow to register as "unused"
* Also fixed some drivers during testing on actual HW (noted in patches)

There is a need for XDP to know more about the RX-queue a given XDP
frames have arrived on.  For both the XDP bpf-prog and kernel side.

Instead of extending struct xdp_buff each time new info is needed,
this patchset takes a different approach.  Struct xdp_buff is only
extended with a pointer to a struct xdp_rxq_info (allowing for easier
extending this later).  This xdp_rxq_info contains information related
to how the driver have setup the individual RX-queue's.  This is
read-mostly information, and all xdp_buff frames (in drivers
napi_poll) point to the same xdp_rxq_info (per RX-queue).

We stress this data/cache-line is for read-mostly info.  This is NOT
for dynamic per packet info, use the data_meta for such use-cases.

This patchset start out small, and only expose ingress_ifindex and the
RX-queue index to the XDP/BPF program. Access to tangible info like
the ingress ifindex and RX queue index, is fairly easy to comprehent.
The other future use-cases could allow XDP frames to be recycled back
to the originating device driver, by providing info on RX device and
queue number.

As XDP doesn't have driver feature flags, and eBPF code due to
bpf-tail-calls cannot determine that XDP driver invoke it, this
patchset have to update every driver that support XDP.

For driver developers (review individual driver patches!):

The xdp_rxq_info is tied to the drivers RX-ring(s). Whenever a RX-ring
modification require (temporary) stopping RX frames, then the
xdp_rxq_info should (likely) also be unregistred and re-registered,
especially if reallocating the pages in the ring. Make sure ethtool
set_channels does the right thing. When replacing XDP prog, if and
only if RX-ring need to be changed, then also re-register the
xdp_rxq_info.

I'm Cc'ing the individual driver patches to the registered maintainers.

Testing:

I've only tested the NIC drivers I have hardware for.  The general
test procedure is to (DUT = Device Under Test):
 (1) run pktgen script pktgen_sample04_many_flows.sh       (against DUT)
 (2) run samples/bpf program xdp_rxq_info --dev $DEV       (on DUT)
 (3) runtime modify number of NIC queues via ethtool -L    (on DUT)
 (4) runtime modify number of NIC ring-size via ethtool -G (on DUT)

Patch based on git tree bpf-next (at commit fb98266):
 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  • Loading branch information
Alexei Starovoitov committed Jan 5, 2018
2 parents 5f103c5 + 0fca931 commit 11d16ed
Show file tree
Hide file tree
Showing 36 changed files with 990 additions and 28 deletions.
10 changes: 10 additions & 0 deletions drivers/net/ethernet/broadcom/bnxt/bnxt.c
Original file line number Diff line number Diff line change
Expand Up @@ -2247,6 +2247,9 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
if (rxr->xdp_prog)
bpf_prog_put(rxr->xdp_prog);

if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
xdp_rxq_info_unreg(&rxr->xdp_rxq);

kfree(rxr->rx_tpa);
rxr->rx_tpa = NULL;

Expand Down Expand Up @@ -2280,6 +2283,10 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)

ring = &rxr->rx_ring_struct;

rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
if (rc < 0)
return rc;

rc = bnxt_alloc_ring(bp, ring);
if (rc)
return rc;
Expand Down Expand Up @@ -2834,6 +2841,9 @@ void bnxt_set_ring_params(struct bnxt *bp)
bp->cp_ring_mask = bp->cp_bit - 1;
}

/* Changing allocation mode of RX rings.
* TODO: Update when extending xdp_rxq_info to support allocation modes.
*/
int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode)
{
if (page_mode) {
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/broadcom/bnxt/bnxt.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <net/devlink.h>
#include <net/dst_metadata.h>
#include <net/switchdev.h>
#include <net/xdp.h>

struct tx_bd {
__le32 tx_bd_len_flags_type;
Expand Down Expand Up @@ -664,6 +665,7 @@ struct bnxt_rx_ring_info {

struct bnxt_ring_struct rx_ring_struct;
struct bnxt_ring_struct rx_agg_ring_struct;
struct xdp_rxq_info xdp_rxq;
};

struct bnxt_cp_ring_info {
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
xdp.data = *data_ptr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = *data_ptr + *len;
xdp.rxq = &rxr->xdp_rxq;
orig_data = xdp.data;
mapping = rx_buf->mapping - bp->rx_dma_offset;

Expand Down
11 changes: 7 additions & 4 deletions drivers/net/ethernet/cavium/thunder/nicvf_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,7 @@ static void nicvf_unmap_page(struct nicvf *nic, struct page *page, u64 dma_addr)

static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
struct cqe_rx_t *cqe_rx, struct snd_queue *sq,
struct sk_buff **skb)
struct rcv_queue *rq, struct sk_buff **skb)
{
struct xdp_buff xdp;
struct page *page;
Expand All @@ -545,6 +545,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
xdp.data = (void *)cpu_addr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + len;
xdp.rxq = &rq->xdp_rxq;
orig_data = xdp.data;

rcu_read_lock();
Expand Down Expand Up @@ -698,7 +699,8 @@ static inline void nicvf_set_rxhash(struct net_device *netdev,

static void nicvf_rcv_pkt_handler(struct net_device *netdev,
struct napi_struct *napi,
struct cqe_rx_t *cqe_rx, struct snd_queue *sq)
struct cqe_rx_t *cqe_rx,
struct snd_queue *sq, struct rcv_queue *rq)
{
struct sk_buff *skb = NULL;
struct nicvf *nic = netdev_priv(netdev);
Expand All @@ -724,7 +726,7 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
/* For XDP, ignore pkts spanning multiple pages */
if (nic->xdp_prog && (cqe_rx->rb_cnt == 1)) {
/* Packet consumed by XDP */
if (nicvf_xdp_rx(snic, nic->xdp_prog, cqe_rx, sq, &skb))
if (nicvf_xdp_rx(snic, nic->xdp_prog, cqe_rx, sq, rq, &skb))
return;
} else {
skb = nicvf_get_rcv_skb(snic, cqe_rx,
Expand Down Expand Up @@ -781,6 +783,7 @@ static int nicvf_cq_intr_handler(struct net_device *netdev, u8 cq_idx,
struct cqe_rx_t *cq_desc;
struct netdev_queue *txq;
struct snd_queue *sq = &qs->sq[cq_idx];
struct rcv_queue *rq = &qs->rq[cq_idx];
unsigned int tx_pkts = 0, tx_bytes = 0, txq_idx;

spin_lock_bh(&cq->lock);
Expand Down Expand Up @@ -811,7 +814,7 @@ static int nicvf_cq_intr_handler(struct net_device *netdev, u8 cq_idx,

switch (cq_desc->cqe_type) {
case CQE_TYPE_RX:
nicvf_rcv_pkt_handler(netdev, napi, cq_desc, sq);
nicvf_rcv_pkt_handler(netdev, napi, cq_desc, sq, rq);
work_done++;
break;
case CQE_TYPE_SEND:
Expand Down
4 changes: 4 additions & 0 deletions drivers/net/ethernet/cavium/thunder/nicvf_queues.c
Original file line number Diff line number Diff line change
Expand Up @@ -760,6 +760,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,

if (!rq->enable) {
nicvf_reclaim_rcv_queue(nic, qs, qidx);
xdp_rxq_info_unreg(&rq->xdp_rxq);
return;
}

Expand All @@ -772,6 +773,9 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
/* all writes of RBDR data to be loaded into L2 Cache as well*/
rq->caching = 1;

/* Driver have no proper error path for failed XDP RX-queue info reg */
WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);

/* Send a mailbox msg to PF to config RQ */
mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
mbx.rq.qs_num = qs->vnic_id;
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/cavium/thunder/nicvf_queues.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <linux/netdevice.h>
#include <linux/iommu.h>
#include <linux/bpf.h>
#include <net/xdp.h>
#include "q_struct.h"

#define MAX_QUEUE_SET 128
Expand Down Expand Up @@ -255,6 +256,7 @@ struct rcv_queue {
u8 start_qs_rbdr_idx; /* RBDR idx in the above QS */
u8 caching;
struct rx_tx_queue_stats stats;
struct xdp_rxq_info xdp_rxq;
} ____cacheline_aligned_in_smp;

struct cmp_queue {
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/intel/i40e/i40e_ethtool.c
Original file line number Diff line number Diff line change
Expand Up @@ -1585,6 +1585,8 @@ static int i40e_set_ringparam(struct net_device *netdev,
*/
rx_rings[i].desc = NULL;
rx_rings[i].rx_bi = NULL;
/* Clear cloned XDP RX-queue info before setup call */
memset(&rx_rings[i].xdp_rxq, 0, sizeof(rx_rings[i].xdp_rxq));
/* this is to allow wr32 to have something to write to
* during early allocation of Rx buffers
*/
Expand Down
18 changes: 16 additions & 2 deletions drivers/net/ethernet/intel/i40e/i40e_txrx.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include <linux/prefetch.h>
#include <net/busy_poll.h>
#include <linux/bpf_trace.h>
#include <net/xdp.h>
#include "i40e.h"
#include "i40e_trace.h"
#include "i40e_prototype.h"
Expand Down Expand Up @@ -1236,6 +1237,8 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
void i40e_free_rx_resources(struct i40e_ring *rx_ring)
{
i40e_clean_rx_ring(rx_ring);
if (rx_ring->vsi->type == I40E_VSI_MAIN)
xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
rx_ring->xdp_prog = NULL;
kfree(rx_ring->rx_bi);
rx_ring->rx_bi = NULL;
Expand All @@ -1256,6 +1259,7 @@ void i40e_free_rx_resources(struct i40e_ring *rx_ring)
int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
{
struct device *dev = rx_ring->dev;
int err = -ENOMEM;
int bi_size;

/* warn if we are about to overwrite the pointer */
Expand Down Expand Up @@ -1283,13 +1287,21 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;

/* XDP RX-queue info only needed for RX rings exposed to XDP */
if (rx_ring->vsi->type == I40E_VSI_MAIN) {
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
rx_ring->queue_index);
if (err < 0)
goto err;
}

rx_ring->xdp_prog = rx_ring->vsi->xdp_prog;

return 0;
err:
kfree(rx_ring->rx_bi);
rx_ring->rx_bi = NULL;
return -ENOMEM;
return err;
}

/**
Expand Down Expand Up @@ -2068,11 +2080,13 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
struct sk_buff *skb = rx_ring->skb;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
bool failure = false, xdp_xmit = false;
struct xdp_buff xdp;

xdp.rxq = &rx_ring->xdp_rxq;

while (likely(total_rx_packets < (unsigned int)budget)) {
struct i40e_rx_buffer *rx_buffer;
union i40e_rx_desc *rx_desc;
struct xdp_buff xdp;
unsigned int size;
u16 vlan_tag;
u8 rx_ptype;
Expand Down
3 changes: 3 additions & 0 deletions drivers/net/ethernet/intel/i40e/i40e_txrx.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
#ifndef _I40E_TXRX_H_
#define _I40E_TXRX_H_

#include <net/xdp.h>

/* Interrupt Throttling and Rate Limiting Goodies */

#define I40E_MAX_ITR 0x0FF0 /* reg uses 2 usec resolution */
Expand Down Expand Up @@ -428,6 +430,7 @@ struct i40e_ring {
*/

struct i40e_channel *ch;
struct xdp_rxq_info xdp_rxq;
} ____cacheline_internodealigned_in_smp;

static inline bool ring_uses_build_skb(struct i40e_ring *ring)
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/intel/ixgbe/ixgbe.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
#include <linux/dca.h>
#endif

#include <net/xdp.h>
#include <net/busy_poll.h>

/* common prefix used by pr_<> macros */
Expand Down Expand Up @@ -371,6 +372,7 @@ struct ixgbe_ring {
struct ixgbe_tx_queue_stats tx_stats;
struct ixgbe_rx_queue_stats rx_stats;
};
struct xdp_rxq_info xdp_rxq;
} ____cacheline_internodealigned_in_smp;

enum ixgbe_ring_f_enum {
Expand Down
4 changes: 4 additions & 0 deletions drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
Original file line number Diff line number Diff line change
Expand Up @@ -1156,6 +1156,10 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
memcpy(&temp_ring[i], adapter->rx_ring[i],
sizeof(struct ixgbe_ring));

/* Clear copied XDP RX-queue info */
memset(&temp_ring[i].xdp_rxq, 0,
sizeof(temp_ring[i].xdp_rxq));

temp_ring[i].count = new_rx_count;
err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]);
if (err) {
Expand Down
10 changes: 9 additions & 1 deletion drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -2318,12 +2318,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
#endif /* IXGBE_FCOE */
u16 cleaned_count = ixgbe_desc_unused(rx_ring);
bool xdp_xmit = false;
struct xdp_buff xdp;

xdp.rxq = &rx_ring->xdp_rxq;

while (likely(total_rx_packets < budget)) {
union ixgbe_adv_rx_desc *rx_desc;
struct ixgbe_rx_buffer *rx_buffer;
struct sk_buff *skb;
struct xdp_buff xdp;
unsigned int size;

/* return some buffers to hardware, one at a time is too slow */
Expand Down Expand Up @@ -6444,6 +6446,11 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;

/* XDP RX-queue info */
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
rx_ring->queue_index) < 0)
goto err;

rx_ring->xdp_prog = adapter->xdp_prog;

return 0;
Expand Down Expand Up @@ -6541,6 +6548,7 @@ void ixgbe_free_rx_resources(struct ixgbe_ring *rx_ring)
ixgbe_clean_rx_ring(rx_ring);

rx_ring->xdp_prog = NULL;
xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
vfree(rx_ring->rx_buffer_info);
rx_ring->rx_buffer_info = NULL;

Expand Down
3 changes: 2 additions & 1 deletion drivers/net/ethernet/mellanox/mlx4/en_netdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -2172,8 +2172,9 @@ static int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)

if (mlx4_en_create_rx_ring(priv, &priv->rx_ring[i],
prof->rx_ring_size, priv->stride,
node))
node, i))
goto err;

}

#ifdef CONFIG_RFS_ACCEL
Expand Down
13 changes: 10 additions & 3 deletions drivers/net/ethernet/mellanox/mlx4/en_rx.c
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev)

int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride, int node)
u32 size, u16 stride, int node, int queue_index)
{
struct mlx4_en_dev *mdev = priv->mdev;
struct mlx4_en_rx_ring *ring;
Expand All @@ -286,14 +286,17 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->log_stride = ffs(ring->stride) - 1;
ring->buf_size = ring->size * ring->stride + TXBB_SIZE;

if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
goto err_ring;

tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
sizeof(struct mlx4_en_rx_alloc));
ring->rx_info = vzalloc_node(tmp, node);
if (!ring->rx_info) {
ring->rx_info = vzalloc(tmp);
if (!ring->rx_info) {
err = -ENOMEM;
goto err_ring;
goto err_xdp_info;
}
}

Expand All @@ -317,6 +320,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
err_info:
vfree(ring->rx_info);
ring->rx_info = NULL;
err_xdp_info:
xdp_rxq_info_unreg(&ring->xdp_rxq);
err_ring:
kfree(ring);
*pring = NULL;
Expand Down Expand Up @@ -440,6 +445,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
lockdep_is_held(&mdev->state_lock));
if (old_prog)
bpf_prog_put(old_prog);
xdp_rxq_info_unreg(&ring->xdp_rxq);
mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
vfree(ring->rx_info);
ring->rx_info = NULL;
Expand Down Expand Up @@ -652,6 +658,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
int cq_ring = cq->ring;
bool doorbell_pending;
struct mlx4_cqe *cqe;
struct xdp_buff xdp;
int polled = 0;
int index;

Expand All @@ -666,6 +673,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
/* Protect accesses to: ring->xdp_prog, priv->mac_hash list */
rcu_read_lock();
xdp_prog = rcu_dereference(ring->xdp_prog);
xdp.rxq = &ring->xdp_rxq;
doorbell_pending = 0;

/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
Expand Down Expand Up @@ -750,7 +758,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
* read bytes but not past the end of the frag.
*/
if (xdp_prog) {
struct xdp_buff xdp;
dma_addr_t dma;
void *orig_data;
u32 act;
Expand Down
4 changes: 3 additions & 1 deletion drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
#endif
#include <linux/cpu_rmap.h>
#include <linux/ptp_clock_kernel.h>
#include <net/xdp.h>

#include <linux/mlx4/device.h>
#include <linux/mlx4/qp.h>
Expand Down Expand Up @@ -356,6 +357,7 @@ struct mlx4_en_rx_ring {
unsigned long dropped;
int hwtstamp_rx_filter;
cpumask_var_t affinity_mask;
struct xdp_rxq_info xdp_rxq;
};

struct mlx4_en_cq {
Expand Down Expand Up @@ -720,7 +722,7 @@ void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev);
void mlx4_en_recover_from_oom(struct mlx4_en_priv *priv);
int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride, int node);
u32 size, u16 stride, int node, int queue_index);
void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride);
Expand Down
Loading

0 comments on commit 11d16ed

Please sign in to comment.