Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Browse files Browse the repository at this point in the history
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-01-07

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a start of a framework for extending struct xdp_buff without
   having the overhead of populating every data at runtime. Idea
   is to have a new per-queue struct xdp_rxq_info that holds read
   mostly data (currently that is, queue number and a pointer to
   the corresponding netdev) which is set up during rxqueue config
   time. When a XDP program is invoked, struct xdp_buff holds a
   pointer to struct xdp_rxq_info that the BPF program can then
   walk. The user facing BPF program that uses struct xdp_md for
   context can use these members directly, and the verifier rewrites
   context access transparently by walking the xdp_rxq_info and
   net_device pointers to load the data, from Jesper.

2) Redo the reporting of offload device information to user space
   such that it works in combination with network namespaces. The
   latter is reported through a device/inode tuple as similarly
   done in other subsystems as well (e.g. perf) in order to identify
   the namespace. For this to work, ns_get_path() has been generalized
   such that the namespace can be retrieved not only from a specific
   task (perf case), but also from a callback where we deduce the
   netns (ns_common) from a netdevice. bpftool support using the new
   uapi info and extensive test cases for test_offload.py in BPF
   selftests have been added as well, from Jakub.

3) Add two bpftool improvements: i) properly report the bpftool
   version such that it corresponds to the version from the kernel
   source tree. So pick the right linux/version.h from the source
   tree instead of the installed one. ii) fix bpftool and also
   bpf_jit_disasm build with bintutils >= 2.9. The reason for the
   build breakage is that binutils library changed the function
   signature to select the disassembler. Given this is needed in
   multiple tools, add a proper feature detection to the
   tools/build/features infrastructure, from Roman.

4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the
   stacktrace map. It is currently unimplemented, but there are
   use cases where user space needs to walk all stacktrace map
   entries e.g. for dumping or deleting map entries w/o having to
   close and recreate the map. Add BPF selftests along with it,
   from Yonghong.

5) Few follow-up cleanups for the bpftool cgroup code: i) rename
   the cgroup 'list' command into 'show' as we have it for other
   subcommands as well, ii) then alias the 'show' command such that
   'list' is accepted which is also common practice in iproute2,
   and iii) remove couple of newlines from error messages using
   p_err(), from Jakub.

6) Two follow-up cleanups to sockmap code: i) remove the unused
   bpf_compute_data_end_sk_skb() function and ii) only build the
   sockmap infrastructure when CONFIG_INET is enabled since it's
   only aware of TCP sockets at this time, from John.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jan 8, 2018
2 parents d0adb51 + 9be99ba commit 7f0b800
Show file tree
Hide file tree
Showing 72 changed files with 1,687 additions and 178 deletions.
10 changes: 10 additions & 0 deletions drivers/net/ethernet/broadcom/bnxt/bnxt.c
Original file line number Diff line number Diff line change
Expand Up @@ -2247,6 +2247,9 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
if (rxr->xdp_prog)
bpf_prog_put(rxr->xdp_prog);

if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
xdp_rxq_info_unreg(&rxr->xdp_rxq);

kfree(rxr->rx_tpa);
rxr->rx_tpa = NULL;

Expand Down Expand Up @@ -2280,6 +2283,10 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)

ring = &rxr->rx_ring_struct;

rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
if (rc < 0)
return rc;

rc = bnxt_alloc_ring(bp, ring);
if (rc)
return rc;
Expand Down Expand Up @@ -2834,6 +2841,9 @@ void bnxt_set_ring_params(struct bnxt *bp)
bp->cp_ring_mask = bp->cp_bit - 1;
}

/* Changing allocation mode of RX rings.
* TODO: Update when extending xdp_rxq_info to support allocation modes.
*/
int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode)
{
if (page_mode) {
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/broadcom/bnxt/bnxt.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <net/devlink.h>
#include <net/dst_metadata.h>
#include <net/switchdev.h>
#include <net/xdp.h>

struct tx_bd {
__le32 tx_bd_len_flags_type;
Expand Down Expand Up @@ -664,6 +665,7 @@ struct bnxt_rx_ring_info {

struct bnxt_ring_struct rx_ring_struct;
struct bnxt_ring_struct rx_agg_ring_struct;
struct xdp_rxq_info xdp_rxq;
};

struct bnxt_cp_ring_info {
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
xdp.data = *data_ptr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = *data_ptr + *len;
xdp.rxq = &rxr->xdp_rxq;
orig_data = xdp.data;
mapping = rx_buf->mapping - bp->rx_dma_offset;

Expand Down
11 changes: 7 additions & 4 deletions drivers/net/ethernet/cavium/thunder/nicvf_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,7 @@ static void nicvf_unmap_page(struct nicvf *nic, struct page *page, u64 dma_addr)

static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
struct cqe_rx_t *cqe_rx, struct snd_queue *sq,
struct sk_buff **skb)
struct rcv_queue *rq, struct sk_buff **skb)
{
struct xdp_buff xdp;
struct page *page;
Expand All @@ -545,6 +545,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
xdp.data = (void *)cpu_addr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + len;
xdp.rxq = &rq->xdp_rxq;
orig_data = xdp.data;

rcu_read_lock();
Expand Down Expand Up @@ -698,7 +699,8 @@ static inline void nicvf_set_rxhash(struct net_device *netdev,

static void nicvf_rcv_pkt_handler(struct net_device *netdev,
struct napi_struct *napi,
struct cqe_rx_t *cqe_rx, struct snd_queue *sq)
struct cqe_rx_t *cqe_rx,
struct snd_queue *sq, struct rcv_queue *rq)
{
struct sk_buff *skb = NULL;
struct nicvf *nic = netdev_priv(netdev);
Expand All @@ -724,7 +726,7 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
/* For XDP, ignore pkts spanning multiple pages */
if (nic->xdp_prog && (cqe_rx->rb_cnt == 1)) {
/* Packet consumed by XDP */
if (nicvf_xdp_rx(snic, nic->xdp_prog, cqe_rx, sq, &skb))
if (nicvf_xdp_rx(snic, nic->xdp_prog, cqe_rx, sq, rq, &skb))
return;
} else {
skb = nicvf_get_rcv_skb(snic, cqe_rx,
Expand Down Expand Up @@ -781,6 +783,7 @@ static int nicvf_cq_intr_handler(struct net_device *netdev, u8 cq_idx,
struct cqe_rx_t *cq_desc;
struct netdev_queue *txq;
struct snd_queue *sq = &qs->sq[cq_idx];
struct rcv_queue *rq = &qs->rq[cq_idx];
unsigned int tx_pkts = 0, tx_bytes = 0, txq_idx;

spin_lock_bh(&cq->lock);
Expand Down Expand Up @@ -811,7 +814,7 @@ static int nicvf_cq_intr_handler(struct net_device *netdev, u8 cq_idx,

switch (cq_desc->cqe_type) {
case CQE_TYPE_RX:
nicvf_rcv_pkt_handler(netdev, napi, cq_desc, sq);
nicvf_rcv_pkt_handler(netdev, napi, cq_desc, sq, rq);
work_done++;
break;
case CQE_TYPE_SEND:
Expand Down
4 changes: 4 additions & 0 deletions drivers/net/ethernet/cavium/thunder/nicvf_queues.c
Original file line number Diff line number Diff line change
Expand Up @@ -760,6 +760,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,

if (!rq->enable) {
nicvf_reclaim_rcv_queue(nic, qs, qidx);
xdp_rxq_info_unreg(&rq->xdp_rxq);
return;
}

Expand All @@ -772,6 +773,9 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
/* all writes of RBDR data to be loaded into L2 Cache as well*/
rq->caching = 1;

/* Driver have no proper error path for failed XDP RX-queue info reg */
WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);

/* Send a mailbox msg to PF to config RQ */
mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
mbx.rq.qs_num = qs->vnic_id;
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/cavium/thunder/nicvf_queues.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <linux/netdevice.h>
#include <linux/iommu.h>
#include <linux/bpf.h>
#include <net/xdp.h>
#include "q_struct.h"

#define MAX_QUEUE_SET 128
Expand Down Expand Up @@ -255,6 +256,7 @@ struct rcv_queue {
u8 start_qs_rbdr_idx; /* RBDR idx in the above QS */
u8 caching;
struct rx_tx_queue_stats stats;
struct xdp_rxq_info xdp_rxq;
} ____cacheline_aligned_in_smp;

struct cmp_queue {
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/intel/i40e/i40e_ethtool.c
Original file line number Diff line number Diff line change
Expand Up @@ -1585,6 +1585,8 @@ static int i40e_set_ringparam(struct net_device *netdev,
*/
rx_rings[i].desc = NULL;
rx_rings[i].rx_bi = NULL;
/* Clear cloned XDP RX-queue info before setup call */
memset(&rx_rings[i].xdp_rxq, 0, sizeof(rx_rings[i].xdp_rxq));
/* this is to allow wr32 to have something to write to
* during early allocation of Rx buffers
*/
Expand Down
18 changes: 16 additions & 2 deletions drivers/net/ethernet/intel/i40e/i40e_txrx.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include <linux/prefetch.h>
#include <net/busy_poll.h>
#include <linux/bpf_trace.h>
#include <net/xdp.h>
#include "i40e.h"
#include "i40e_trace.h"
#include "i40e_prototype.h"
Expand Down Expand Up @@ -1236,6 +1237,8 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
void i40e_free_rx_resources(struct i40e_ring *rx_ring)
{
i40e_clean_rx_ring(rx_ring);
if (rx_ring->vsi->type == I40E_VSI_MAIN)
xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
rx_ring->xdp_prog = NULL;
kfree(rx_ring->rx_bi);
rx_ring->rx_bi = NULL;
Expand All @@ -1256,6 +1259,7 @@ void i40e_free_rx_resources(struct i40e_ring *rx_ring)
int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
{
struct device *dev = rx_ring->dev;
int err = -ENOMEM;
int bi_size;

/* warn if we are about to overwrite the pointer */
Expand Down Expand Up @@ -1283,13 +1287,21 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;

/* XDP RX-queue info only needed for RX rings exposed to XDP */
if (rx_ring->vsi->type == I40E_VSI_MAIN) {
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
rx_ring->queue_index);
if (err < 0)
goto err;
}

rx_ring->xdp_prog = rx_ring->vsi->xdp_prog;

return 0;
err:
kfree(rx_ring->rx_bi);
rx_ring->rx_bi = NULL;
return -ENOMEM;
return err;
}

/**
Expand Down Expand Up @@ -2068,11 +2080,13 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
struct sk_buff *skb = rx_ring->skb;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
bool failure = false, xdp_xmit = false;
struct xdp_buff xdp;

xdp.rxq = &rx_ring->xdp_rxq;

while (likely(total_rx_packets < (unsigned int)budget)) {
struct i40e_rx_buffer *rx_buffer;
union i40e_rx_desc *rx_desc;
struct xdp_buff xdp;
unsigned int size;
u16 vlan_tag;
u8 rx_ptype;
Expand Down
3 changes: 3 additions & 0 deletions drivers/net/ethernet/intel/i40e/i40e_txrx.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
#ifndef _I40E_TXRX_H_
#define _I40E_TXRX_H_

#include <net/xdp.h>

/* Interrupt Throttling and Rate Limiting Goodies */

#define I40E_MAX_ITR 0x0FF0 /* reg uses 2 usec resolution */
Expand Down Expand Up @@ -428,6 +430,7 @@ struct i40e_ring {
*/

struct i40e_channel *ch;
struct xdp_rxq_info xdp_rxq;
} ____cacheline_internodealigned_in_smp;

static inline bool ring_uses_build_skb(struct i40e_ring *ring)
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/intel/ixgbe/ixgbe.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
#include <linux/dca.h>
#endif

#include <net/xdp.h>
#include <net/busy_poll.h>

/* common prefix used by pr_<> macros */
Expand Down Expand Up @@ -371,6 +372,7 @@ struct ixgbe_ring {
struct ixgbe_tx_queue_stats tx_stats;
struct ixgbe_rx_queue_stats rx_stats;
};
struct xdp_rxq_info xdp_rxq;
} ____cacheline_internodealigned_in_smp;

enum ixgbe_ring_f_enum {
Expand Down
4 changes: 4 additions & 0 deletions drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
Original file line number Diff line number Diff line change
Expand Up @@ -1156,6 +1156,10 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
memcpy(&temp_ring[i], adapter->rx_ring[i],
sizeof(struct ixgbe_ring));

/* Clear copied XDP RX-queue info */
memset(&temp_ring[i].xdp_rxq, 0,
sizeof(temp_ring[i].xdp_rxq));

temp_ring[i].count = new_rx_count;
err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]);
if (err) {
Expand Down
10 changes: 9 additions & 1 deletion drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -2318,12 +2318,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
#endif /* IXGBE_FCOE */
u16 cleaned_count = ixgbe_desc_unused(rx_ring);
bool xdp_xmit = false;
struct xdp_buff xdp;

xdp.rxq = &rx_ring->xdp_rxq;

while (likely(total_rx_packets < budget)) {
union ixgbe_adv_rx_desc *rx_desc;
struct ixgbe_rx_buffer *rx_buffer;
struct sk_buff *skb;
struct xdp_buff xdp;
unsigned int size;

/* return some buffers to hardware, one at a time is too slow */
Expand Down Expand Up @@ -6444,6 +6446,11 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;

/* XDP RX-queue info */
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
rx_ring->queue_index) < 0)
goto err;

rx_ring->xdp_prog = adapter->xdp_prog;

return 0;
Expand Down Expand Up @@ -6541,6 +6548,7 @@ void ixgbe_free_rx_resources(struct ixgbe_ring *rx_ring)
ixgbe_clean_rx_ring(rx_ring);

rx_ring->xdp_prog = NULL;
xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
vfree(rx_ring->rx_buffer_info);
rx_ring->rx_buffer_info = NULL;

Expand Down
3 changes: 2 additions & 1 deletion drivers/net/ethernet/mellanox/mlx4/en_netdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -2172,8 +2172,9 @@ static int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)

if (mlx4_en_create_rx_ring(priv, &priv->rx_ring[i],
prof->rx_ring_size, priv->stride,
node))
node, i))
goto err;

}

#ifdef CONFIG_RFS_ACCEL
Expand Down
13 changes: 10 additions & 3 deletions drivers/net/ethernet/mellanox/mlx4/en_rx.c
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev)

int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride, int node)
u32 size, u16 stride, int node, int queue_index)
{
struct mlx4_en_dev *mdev = priv->mdev;
struct mlx4_en_rx_ring *ring;
Expand All @@ -286,14 +286,17 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
ring->log_stride = ffs(ring->stride) - 1;
ring->buf_size = ring->size * ring->stride + TXBB_SIZE;

if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
goto err_ring;

tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
sizeof(struct mlx4_en_rx_alloc));
ring->rx_info = vzalloc_node(tmp, node);
if (!ring->rx_info) {
ring->rx_info = vzalloc(tmp);
if (!ring->rx_info) {
err = -ENOMEM;
goto err_ring;
goto err_xdp_info;
}
}

Expand All @@ -317,6 +320,8 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
err_info:
vfree(ring->rx_info);
ring->rx_info = NULL;
err_xdp_info:
xdp_rxq_info_unreg(&ring->xdp_rxq);
err_ring:
kfree(ring);
*pring = NULL;
Expand Down Expand Up @@ -440,6 +445,7 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
lockdep_is_held(&mdev->state_lock));
if (old_prog)
bpf_prog_put(old_prog);
xdp_rxq_info_unreg(&ring->xdp_rxq);
mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
vfree(ring->rx_info);
ring->rx_info = NULL;
Expand Down Expand Up @@ -652,6 +658,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
int cq_ring = cq->ring;
bool doorbell_pending;
struct mlx4_cqe *cqe;
struct xdp_buff xdp;
int polled = 0;
int index;

Expand All @@ -666,6 +673,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
/* Protect accesses to: ring->xdp_prog, priv->mac_hash list */
rcu_read_lock();
xdp_prog = rcu_dereference(ring->xdp_prog);
xdp.rxq = &ring->xdp_rxq;
doorbell_pending = 0;

/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
Expand Down Expand Up @@ -750,7 +758,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
* read bytes but not past the end of the frag.
*/
if (xdp_prog) {
struct xdp_buff xdp;
dma_addr_t dma;
void *orig_data;
u32 act;
Expand Down
4 changes: 3 additions & 1 deletion drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
#endif
#include <linux/cpu_rmap.h>
#include <linux/ptp_clock_kernel.h>
#include <net/xdp.h>

#include <linux/mlx4/device.h>
#include <linux/mlx4/qp.h>
Expand Down Expand Up @@ -356,6 +357,7 @@ struct mlx4_en_rx_ring {
unsigned long dropped;
int hwtstamp_rx_filter;
cpumask_var_t affinity_mask;
struct xdp_rxq_info xdp_rxq;
};

struct mlx4_en_cq {
Expand Down Expand Up @@ -720,7 +722,7 @@ void mlx4_en_set_num_rx_rings(struct mlx4_en_dev *mdev);
void mlx4_en_recover_from_oom(struct mlx4_en_priv *priv);
int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride, int node);
u32 size, u16 stride, int node, int queue_index);
void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring **pring,
u32 size, u16 stride);
Expand Down
Loading

0 comments on commit 7f0b800

Please sign in to comment.