Skip to content

Commit

Permalink
Merge branch 'atlantic-xdp-multi-buffer'
Browse files Browse the repository at this point in the history
[PATCH net-next v5 0/3] net: atlantic: Add XDP support
@ 2022-04-17 10:12 Taehee Yoo
  2022-04-17 10:12 ` [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane Taehee Yoo
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw)
  To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf
  Cc: ap420073

This patchset is to make atlantic to support multi-buffer XDP.

The first patch implement control plane of xdp.
The aq_xdp(), callback of .xdp_bpf is added.

The second patch implements data plane of xdp.
XDP_TX, XDP_DROP, and XDP_PASS is supported.
__aq_ring_xdp_clean() is added to receive and execute xdp program.
aq_nic_xmit_xdpf() is added to send packet by XDP.

The third patch implements callback of .ndo_xdp_xmit.
aq_xdp_xmit() is added to send redirected packets and it internally
calls aq_nic_xmit_xdpf().

Memory model is MEM_TYPE_PAGE_SHARED.

Order-2 page allocation is used when XDP is enabled.

LRO will be disabled if XDP program doesn't supports multi buffer.

AQC chip supports 32 multi-queues and 8 vectors(irq).
There are two options.
1. under 8 cores and maximum 4 tx queues per core.
2. under 4 cores and maximum 8 tx queues per core.

Like other drivers, these tx queues can be used only for XDP_TX,
XDP_REDIRECT queue. If so, no tx_lock is needed.
But this patchset doesn't use this strategy because getting hardware tx
queue index cost is too high.
So, tx_lock is used in the aq_nic_xmit_xdpf().

single-core, single queue, 80% cpu utilization.

  32.30%  [kernel]                  [k] aq_get_rxpages_xdp
  10.44%  [kernel]                  [k] aq_hw_read_reg <---------- here
   9.86%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
   5.51%  [kernel]                  [k] aq_ring_rx_clean

single-core, 8 queues, 100% cpu utilization, half PPS.

  52.03%  [kernel]                  [k] aq_hw_read_reg <---------- here
  18.24%  [kernel]                  [k] aq_get_rxpages_xdp
   4.30%  [kernel]                  [k] hw_atl_b0_hw_ring_rx_receive
   4.24%  bpf_prog_xxx_xdp_prog_tx  [k] bpf_prog_xxx_xdp_prog_tx
   2.79%  [kernel]                  [k] aq_ring_rx_clean

Performance result(64 Byte)
1. XDP_TX
  a. xdp_geieric, single core
    - 2.5Mpps, 100% cpu
  b. xdp_driver, single core
    - 4.5Mpps, 80% cpu
  c. xdp_generic, 8 core(hyper thread)
    - 6.3Mpps, 40% cpu
  d. xdp_driver, 8 core(hyper thread)
    - 6.3Mpps, 30% cpu

2. XDP_REDIRECT
  a. xdp_generic, single core
    - 2.3Mpps
  b. xdp_driver, single core
    - 4.5Mpps

v5:
 - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0
 - Use 2K frame size instead of 3K
 - Use order-2 page allocation instead of order-0
 - Rename aq_get_rxpage() to aq_alloc_rxpages()
 - Add missing PageFree stats for ethtool
 - Remove aq_unset_rxpage_xdp(), introduced by v2 patch due to
   change of memory model
 - Fix wrong last parameter value of xdp_prepare_buff()
 - Add aq_get_rxpages_xdp() to increase page reference count

v4:
 - Fix compile warning

v3:
 - Change wrong PPS performance result 40% -> 80% in single
   core(Intel i3-12100)
 - Separate aq_nic_map_xdp() from aq_nic_map_skb()
 - Drop multi buffer packets if single buffer XDP is attached
 - Disable LRO when single buffer XDP is attached
 - Use xdp_get_{frame/buff}_len()

v2:
 - Do not use inline in C file

Taehee Yoo (3):
  net: atlantic: Implement xdp control plane
  net: atlantic: Implement xdp data plane
  net: atlantic: Implement .ndo_xdp_xmit handler

 .../net/ethernet/aquantia/atlantic/aq_cfg.h   |   1 +
 .../ethernet/aquantia/atlantic/aq_ethtool.c   |   9 +
 .../net/ethernet/aquantia/atlantic/aq_main.c  |  87 ++++
 .../net/ethernet/aquantia/atlantic/aq_main.h  |   2 +
 .../net/ethernet/aquantia/atlantic/aq_nic.c   | 136 ++++++
 .../net/ethernet/aquantia/atlantic/aq_nic.h   |   5 +
 .../net/ethernet/aquantia/atlantic/aq_ring.c  | 409 ++++++++++++++++--
 .../net/ethernet/aquantia/atlantic/aq_ring.h  |  21 +-
 .../net/ethernet/aquantia/atlantic/aq_vec.c   |  23 +-
 .../net/ethernet/aquantia/atlantic/aq_vec.h   |   6 +
 .../aquantia/atlantic/hw_atl/hw_atl_a0.c      |   6 +-
 .../aquantia/atlantic/hw_atl/hw_atl_b0.c      |  10 +-
 12 files changed, 670 insertions(+), 45 deletions(-)

--
2.17.1

^ permalink raw reply	[flat|nested] 4+ messages in thread
* [PATCH net-next v5 1/3] net: atlantic: Implement xdp control plane
  2022-04-17 10:12 [PATCH net-next v5 0/3] net: atlantic: Add XDP support Taehee Yoo
@ 2022-04-17 10:12 ` Taehee Yoo
  2022-04-17 10:12 ` [PATCH net-next v5 2/3] net: atlantic: Implement xdp data plane Taehee Yoo
  2022-04-17 10:12 ` [PATCH net-next v5 3/3] net: atlantic: Implement .ndo_xdp_xmit handler Taehee Yoo
  2 siblings, 0 replies; 4+ messages in thread
From: Taehee Yoo @ 2022-04-17 10:12 UTC (permalink / raw)
  To: davem, kuba, pabeni, netdev, irusskikh, ast, daniel, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh, bpf
  Cc: ap420073

aq_xdp() is a xdp setup callback function for Atlantic driver.
When XDP is attached or detached, the device will be restarted because
it uses different headroom, tailroom, and page order value.

If XDP enabled, it switches default page order value from 0 to 2.
Because the default maximum frame size is still 2K and it needs
additional area for headroom and tailroom.
The total size(headroom + frame size + tailroom) is 2624.
So, 1472Bytes will be always wasted for every frame.
But when order-2 is used, these pages can be used 6 times
with flip strategy.
It means only about 106Bytes per frame will be wasted.

Also, It supports xdp fragment feature.
MTU can be 16K if xdp prog supports xdp fragment.
If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.

And a static key is added and It will be used to call the xdp_clean
handler in ->poll(). data plane implementation will be contained
the followed patch.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---

v5:
 - Use MEM_TYPE_PAGE_SHARED instead of MEM_TYPE_PAGE_ORDER0
 - Use 2K frame size instead of 3K
 - Use order-2 page allocation instead of order-0
 - Rename aq_get_rxpage() to aq_alloc_rxpages()

v4:
 - No changed

v3:
 - Disable LRO when single buffer XDP is attached

v2:
 - No changed
  • Loading branch information
David S. Miller committed Apr 20, 2022
2 parents 8ab38ed + 45638f0 commit e97e917
Show file tree
Hide file tree
Showing 12 changed files with 670 additions and 45 deletions.
1 change: 1 addition & 0 deletions drivers/net/ethernet/aquantia/atlantic/aq_cfg.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
#define AQ_CFG_RX_HDR_SIZE 256U

#define AQ_CFG_RX_PAGEORDER 0U
#define AQ_CFG_XDP_PAGEORDER 2U

/* LRO */
#define AQ_CFG_IS_LRO_DEF 1U
Expand Down
9 changes: 9 additions & 0 deletions drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,15 @@ static const char * const aq_ethtool_queue_rx_stat_names[] = {
"%sQueue[%d] AllocFails",
"%sQueue[%d] SkbAllocFails",
"%sQueue[%d] Polls",
"%sQueue[%d] PageFlips",
"%sQueue[%d] PageReuses",
"%sQueue[%d] PageFrees",
"%sQueue[%d] XdpAbort",
"%sQueue[%d] XdpDrop",
"%sQueue[%d] XdpPass",
"%sQueue[%d] XdpTx",
"%sQueue[%d] XdpInvalid",
"%sQueue[%d] XdpRedirect",
};

static const char * const aq_ethtool_queue_tx_stat_names[] = {
Expand Down
87 changes: 87 additions & 0 deletions drivers/net/ethernet/aquantia/atlantic/aq_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,22 @@
#include "aq_ptp.h"
#include "aq_filters.h"
#include "aq_hw_utils.h"
#include "aq_vec.h"

#include <linux/netdevice.h>
#include <linux/module.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <net/pkt_cls.h>
#include <linux/filter.h>

MODULE_LICENSE("GPL v2");
MODULE_AUTHOR(AQ_CFG_DRV_AUTHOR);
MODULE_DESCRIPTION(AQ_CFG_DRV_DESC);

DEFINE_STATIC_KEY_FALSE(aq_xdp_locking_key);
EXPORT_SYMBOL(aq_xdp_locking_key);

static const char aq_ndev_driver_name[] = AQ_CFG_DRV_NAME;

static const struct net_device_ops aq_ndev_ops;
Expand Down Expand Up @@ -126,9 +131,19 @@ static netdev_tx_t aq_ndev_start_xmit(struct sk_buff *skb, struct net_device *nd

static int aq_ndev_change_mtu(struct net_device *ndev, int new_mtu)
{
int new_frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN;
struct aq_nic_s *aq_nic = netdev_priv(ndev);
struct bpf_prog *prog;
int err;

prog = READ_ONCE(aq_nic->xdp_prog);
if (prog && !prog->aux->xdp_has_frags &&
new_frame_size > AQ_CFG_RX_FRAME_MAX) {
netdev_err(ndev, "Illegal MTU %d for XDP prog without frags\n",
ndev->mtu);
return -EOPNOTSUPP;
}

err = aq_nic_set_mtu(aq_nic, new_mtu + ETH_HLEN);

if (err < 0)
Expand Down Expand Up @@ -204,6 +219,25 @@ static int aq_ndev_set_features(struct net_device *ndev,
return err;
}

static netdev_features_t aq_ndev_fix_features(struct net_device *ndev,
netdev_features_t features)
{
struct aq_nic_s *aq_nic = netdev_priv(ndev);
struct bpf_prog *prog;

if (!(features & NETIF_F_RXCSUM))
features &= ~NETIF_F_LRO;

prog = READ_ONCE(aq_nic->xdp_prog);
if (prog && !prog->aux->xdp_has_frags &&
aq_nic->xdp_prog && features & NETIF_F_LRO) {
netdev_err(ndev, "LRO is not supported with single buffer XDP, disabling\n");
features &= ~NETIF_F_LRO;
}

return features;
}

static int aq_ndev_set_mac_address(struct net_device *ndev, void *addr)
{
struct aq_nic_s *aq_nic = netdev_priv(ndev);
Expand Down Expand Up @@ -410,6 +444,56 @@ static int aq_ndo_setup_tc(struct net_device *dev, enum tc_setup_type type,
mqprio->qopt.prio_tc_map);
}

static int aq_xdp_setup(struct net_device *ndev, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
{
bool need_update, running = netif_running(ndev);
struct aq_nic_s *aq_nic = netdev_priv(ndev);
struct bpf_prog *old_prog;

if (prog && !prog->aux->xdp_has_frags) {
if (ndev->mtu > AQ_CFG_RX_FRAME_MAX) {
NL_SET_ERR_MSG_MOD(extack,
"prog does not support XDP frags");
return -EOPNOTSUPP;
}

if (prog && ndev->features & NETIF_F_LRO) {
netdev_err(ndev,
"LRO is not supported with single buffer XDP, disabling\n");
ndev->features &= ~NETIF_F_LRO;
}
}

need_update = !!aq_nic->xdp_prog != !!prog;
if (running && need_update)
aq_ndev_close(ndev);

old_prog = xchg(&aq_nic->xdp_prog, prog);
if (old_prog)
bpf_prog_put(old_prog);

if (!old_prog && prog)
static_branch_inc(&aq_xdp_locking_key);
else if (old_prog && !prog)
static_branch_dec(&aq_xdp_locking_key);

if (running && need_update)
return aq_ndev_open(ndev);

return 0;
}

static int aq_xdp(struct net_device *dev, struct netdev_bpf *xdp)
{
switch (xdp->command) {
case XDP_SETUP_PROG:
return aq_xdp_setup(dev, xdp->prog, xdp->extack);
default:
return -EINVAL;
}
}

static const struct net_device_ops aq_ndev_ops = {
.ndo_open = aq_ndev_open,
.ndo_stop = aq_ndev_close,
Expand All @@ -418,10 +502,13 @@ static const struct net_device_ops aq_ndev_ops = {
.ndo_change_mtu = aq_ndev_change_mtu,
.ndo_set_mac_address = aq_ndev_set_mac_address,
.ndo_set_features = aq_ndev_set_features,
.ndo_fix_features = aq_ndev_fix_features,
.ndo_eth_ioctl = aq_ndev_ioctl,
.ndo_vlan_rx_add_vid = aq_ndo_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = aq_ndo_vlan_rx_kill_vid,
.ndo_setup_tc = aq_ndo_setup_tc,
.ndo_bpf = aq_xdp,
.ndo_xdp_xmit = aq_xdp_xmit,
};

static int __init aq_ndev_init_module(void)
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/ethernet/aquantia/atlantic/aq_main.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
#include "aq_common.h"
#include "aq_nic.h"

DECLARE_STATIC_KEY_FALSE(aq_xdp_locking_key);

void aq_ndev_schedule_work(struct work_struct *work);
struct net_device *aq_ndev_alloc(void);

Expand Down
136 changes: 136 additions & 0 deletions drivers/net/ethernet/aquantia/atlantic/aq_nic.c
Original file line number Diff line number Diff line change
Expand Up @@ -569,6 +569,103 @@ int aq_nic_start(struct aq_nic_s *self)
return err;
}

static unsigned int aq_nic_map_xdp(struct aq_nic_s *self,
struct xdp_frame *xdpf,
struct aq_ring_s *ring)
{
struct device *dev = aq_nic_get_dev(self);
struct aq_ring_buff_s *first = NULL;
unsigned int dx = ring->sw_tail;
struct aq_ring_buff_s *dx_buff;
struct skb_shared_info *sinfo;
unsigned int frag_count = 0U;
unsigned int nr_frags = 0U;
unsigned int ret = 0U;
u16 total_len;

dx_buff = &ring->buff_ring[dx];
dx_buff->flags = 0U;

sinfo = xdp_get_shared_info_from_frame(xdpf);
total_len = xdpf->len;
dx_buff->len = total_len;
if (xdp_frame_has_frags(xdpf)) {
nr_frags = sinfo->nr_frags;
total_len += sinfo->xdp_frags_size;
}
dx_buff->pa = dma_map_single(dev, xdpf->data, dx_buff->len,
DMA_TO_DEVICE);

if (unlikely(dma_mapping_error(dev, dx_buff->pa)))
goto exit;

first = dx_buff;
dx_buff->len_pkt = total_len;
dx_buff->is_sop = 1U;
dx_buff->is_mapped = 1U;
++ret;

for (; nr_frags--; ++frag_count) {
skb_frag_t *frag = &sinfo->frags[frag_count];
unsigned int frag_len = skb_frag_size(frag);
unsigned int buff_offset = 0U;
unsigned int buff_size = 0U;
dma_addr_t frag_pa;

while (frag_len) {
if (frag_len > AQ_CFG_TX_FRAME_MAX)
buff_size = AQ_CFG_TX_FRAME_MAX;
else
buff_size = frag_len;

frag_pa = skb_frag_dma_map(dev, frag, buff_offset,
buff_size, DMA_TO_DEVICE);

if (unlikely(dma_mapping_error(dev, frag_pa)))
goto mapping_error;

dx = aq_ring_next_dx(ring, dx);
dx_buff = &ring->buff_ring[dx];

dx_buff->flags = 0U;
dx_buff->len = buff_size;
dx_buff->pa = frag_pa;
dx_buff->is_mapped = 1U;
dx_buff->eop_index = 0xffffU;

frag_len -= buff_size;
buff_offset += buff_size;

++ret;
}
}

first->eop_index = dx;
dx_buff->is_eop = 1U;
dx_buff->skb = NULL;
dx_buff->xdpf = xdpf;
goto exit;

mapping_error:
for (dx = ring->sw_tail;
ret > 0;
--ret, dx = aq_ring_next_dx(ring, dx)) {
dx_buff = &ring->buff_ring[dx];

if (!dx_buff->pa)
continue;
if (unlikely(dx_buff->is_sop))
dma_unmap_single(dev, dx_buff->pa, dx_buff->len,
DMA_TO_DEVICE);
else
dma_unmap_page(dev, dx_buff->pa, dx_buff->len,
DMA_TO_DEVICE);
}

exit:
return ret;
}

unsigned int aq_nic_map_skb(struct aq_nic_s *self, struct sk_buff *skb,
struct aq_ring_s *ring)
{
Expand Down Expand Up @@ -697,6 +794,7 @@ unsigned int aq_nic_map_skb(struct aq_nic_s *self, struct sk_buff *skb,
first->eop_index = dx;
dx_buff->is_eop = 1U;
dx_buff->skb = skb;
dx_buff->xdpf = NULL;
goto exit;

mapping_error:
Expand Down Expand Up @@ -725,6 +823,44 @@ unsigned int aq_nic_map_skb(struct aq_nic_s *self, struct sk_buff *skb,
return ret;
}

int aq_nic_xmit_xdpf(struct aq_nic_s *aq_nic, struct aq_ring_s *tx_ring,
struct xdp_frame *xdpf)
{
u16 queue_index = AQ_NIC_RING2QMAP(aq_nic, tx_ring->idx);
struct net_device *ndev = aq_nic_get_ndev(aq_nic);
struct skb_shared_info *sinfo;
int cpu = smp_processor_id();
int err = NETDEV_TX_BUSY;
struct netdev_queue *nq;
unsigned int frags = 1;

if (xdp_frame_has_frags(xdpf)) {
sinfo = xdp_get_shared_info_from_frame(xdpf);
frags += sinfo->nr_frags;
}

if (frags > AQ_CFG_SKB_FRAGS_MAX)
return err;

nq = netdev_get_tx_queue(ndev, tx_ring->idx);
__netif_tx_lock(nq, cpu);

aq_ring_update_queue_state(tx_ring);

/* Above status update may stop the queue. Check this. */
if (__netif_subqueue_stopped(aq_nic_get_ndev(aq_nic), queue_index))
goto out;

frags = aq_nic_map_xdp(aq_nic, xdpf, tx_ring);
if (likely(frags))
err = aq_nic->aq_hw_ops->hw_ring_tx_xmit(aq_nic->aq_hw, tx_ring,
frags);
out:
__netif_tx_unlock(nq);

return err;
}

int aq_nic_xmit(struct aq_nic_s *self, struct sk_buff *skb)
{
struct aq_nic_cfg_s *cfg = aq_nic_get_cfg(self);
Expand Down
5 changes: 5 additions & 0 deletions drivers/net/ethernet/aquantia/atlantic/aq_nic.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
#define AQ_NIC_H

#include <linux/ethtool.h>
#include <net/xdp.h>
#include <linux/bpf.h>

#include "aq_common.h"
#include "aq_rss.h"
Expand Down Expand Up @@ -128,6 +130,7 @@ struct aq_nic_s {
struct aq_vec_s *aq_vec[AQ_CFG_VECS_MAX];
struct aq_ring_s *aq_ring_tx[AQ_HW_QUEUES_MAX];
struct aq_hw_s *aq_hw;
struct bpf_prog *xdp_prog;
struct net_device *ndev;
unsigned int aq_vecs;
unsigned int packet_filter;
Expand Down Expand Up @@ -177,6 +180,8 @@ void aq_nic_ndev_free(struct aq_nic_s *self);
int aq_nic_start(struct aq_nic_s *self);
unsigned int aq_nic_map_skb(struct aq_nic_s *self, struct sk_buff *skb,
struct aq_ring_s *ring);
int aq_nic_xmit_xdpf(struct aq_nic_s *aq_nic, struct aq_ring_s *tx_ring,
struct xdp_frame *xdpf);
int aq_nic_xmit(struct aq_nic_s *self, struct sk_buff *skb);
int aq_nic_get_regs(struct aq_nic_s *self, struct ethtool_regs *regs, void *p);
int aq_nic_get_regs_count(struct aq_nic_s *self);
Expand Down
Loading

0 comments on commit e97e917

Please sign in to comment.