Skip to content

Commit

Permalink
Merge branch 'udp_tunnel-add-NIC-RX-port-offload-infrastructure'
Browse files Browse the repository at this point in the history
Jakub Kicinski says:

====================
udp_tunnel: add NIC RX port offload infrastructure

Kernel has a facility to notify drivers about the UDP tunnel ports
so that devices can recognize tunneled packets. This is important
mostly for RX - devices which don't support CHECKSUM_COMPLETE can
report checksums of inner packets, and compute RSS over inner headers.
Some drivers also match the UDP tunnel ports also for TX, although
doing so may lead to false positives and negatives.

Unfortunately the user experience when trying to take adavantage
of these facilities is suboptimal. First of all there is no way
for users to check which ports are offloaded. Many drivers resort
to printing messages to aid debugging, other use debugfs. Even worse
the availability of the RX features (NETIF_F_RX_UDP_TUNNEL_PORT)
is established purely on the basis of the driver having the ndos
installed. For most drivers, however, the ability to perform offloads
is contingent on device capabilities (driver support multiple device
and firmware versions). Unless driver resorts to hackish clearing
of features set incorrectly by the core - users are left guessing
whether their device really supports UDP tunnel port offload or not.

There is currently no way to indicate or configure whether RX
features include just the checksum offload or checksum and using
inner headers for RSS. Many drivers default to not using inner
headers for RSS because most implementations populate the source
port with entropy from the inner headers. This, however, is not
always the case, for example certain switches are only able to
use a fixed source port during encapsulation.

We have also seen many driver authors get the intricacies of UDP
tunnel port offloads wrong. Most commonly the drivers forget to
perform reference counting, or take sleeping locks in the callbacks.

This work tries to improve the situation by pulling the UDP tunnel
port table maintenance out of the drivers. It turns out that almost
all drivers maintain a fixed size table of ports (in most cases one
per tunnel type), so we can take care of all the refcounting in the
core, and let the driver specify if they need to sleep in the
callbacks or not. The new common implementation will also support
replacing ports - when a port is removed from a full table it will
try to find a previously missing port to take its place.

This patch only implements the core functionality along with a few
drivers I was hoping to test manually [1] along with a test based
on a netdevsim implementation. Following patches will convert all
the drivers. Once that's complete we can remove the ndos, and rely
directly on the new infrastrucutre.

Then after RSS (RXFH) is converted to netlink we can add the ability
to configure the use of inner RSS headers for UDP tunnels.

[1] Unfortunately I wasn't able to, turns out 2 of the devices
I had access to were older generation or had old FW, and they
did not actually support UDP tunnel port notifications (see
the second paragraph). The thrid device appears to program
the UDP ports correctly but it generates bad UDP checksums with
or without these patches. Long story short - I'd appreciate
reviews and testing here..

v4:
 - better build fix (hopefully this one does it..)
v3:
 - fix build issue;
 - improve bnxt changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jul 10, 2020
2 parents 8fb49c0 + fb6f897 commit 0ea4604
Show file tree
Hide file tree
Showing 35 changed files with 2,608 additions and 397 deletions.
12 changes: 8 additions & 4 deletions Documentation/filesystems/debugfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,13 +185,17 @@ byte offsets over a base for the register block.

If you want to dump an u32 array in debugfs, you can create file with::

struct debugfs_u32_array {
u32 *array;
u32 n_elements;
};

void debugfs_create_u32_array(const char *name, umode_t mode,
struct dentry *parent,
u32 *array, u32 elements);
struct debugfs_u32_array *array);

The "array" argument provides data, and the "elements" argument is
the number of elements in the array. Note: Once array is created its
size can not be changed.
The "array" argument wraps a pointer to the array's data and the number
of its elements. Note: Once array is created its size can not be changed.

There is a helper function to create device related seq_file::

Expand Down
33 changes: 33 additions & 0 deletions Documentation/networking/ethtool-netlink.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1230,6 +1230,39 @@ used to report the amplitude of the reflection for a given pair.
| | | ``ETHTOOL_A_CABLE_AMPLITUDE_mV`` | s16 | Reflection amplitude |
+-+-+-----------------------------------------+--------+----------------------+

TUNNEL_INFO
===========

Gets information about the tunnel state NIC is aware of.

Request contents:

===================================== ====== ==========================
``ETHTOOL_A_TUNNEL_INFO_HEADER`` nested request header
===================================== ====== ==========================

Kernel response contents:

+---------------------------------------------+--------+---------------------+
| ``ETHTOOL_A_TUNNEL_INFO_HEADER`` | nested | reply header |
+---------------------------------------------+--------+---------------------+
| ``ETHTOOL_A_TUNNEL_INFO_UDP_PORTS`` | nested | all UDP port tables |
+-+-------------------------------------------+--------+---------------------+
| | ``ETHTOOL_A_TUNNEL_UDP_TABLE`` | nested | one UDP port table |
+-+-+-----------------------------------------+--------+---------------------+
| | | ``ETHTOOL_A_TUNNEL_UDP_TABLE_SIZE`` | u32 | max size of the |
| | | | | table |
+-+-+-----------------------------------------+--------+---------------------+
| | | ``ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES`` | bitset | tunnel types which |
| | | | | table can hold |
+-+-+-----------------------------------------+--------+---------------------+
| | | ``ETHTOOL_A_TUNNEL_UDP_TABLE_ENTRY`` | nested | offloaded UDP port |
+-+-+-+---------------------------------------+--------+---------------------+
| | | | ``ETHTOOL_A_TUNNEL_UDP_ENTRY_PORT`` | be16 | UDP port |
+-+-+-+---------------------------------------+--------+---------------------+
| | | | ``ETHTOOL_A_TUNNEL_UDP_ENTRY_TYPE`` | u32 | tunnel type |
+-+-+-+---------------------------------------+--------+---------------------+

Request translation
===================

Expand Down
141 changes: 38 additions & 103 deletions drivers/net/ethernet/broadcom/bnxt/bnxt.c
Original file line number Diff line number Diff line change
Expand Up @@ -4509,10 +4509,12 @@ static int bnxt_hwrm_tunnel_dst_port_free(struct bnxt *bp, u8 tunnel_type)

switch (tunnel_type) {
case TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN:
req.tunnel_dst_port_id = bp->vxlan_fw_dst_port_id;
req.tunnel_dst_port_id = cpu_to_le16(bp->vxlan_fw_dst_port_id);
bp->vxlan_fw_dst_port_id = INVALID_HW_RING_ID;
break;
case TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE:
req.tunnel_dst_port_id = bp->nge_fw_dst_port_id;
req.tunnel_dst_port_id = cpu_to_le16(bp->nge_fw_dst_port_id);
bp->nge_fw_dst_port_id = INVALID_HW_RING_ID;
break;
default:
break;
Expand Down Expand Up @@ -4547,10 +4549,11 @@ static int bnxt_hwrm_tunnel_dst_port_alloc(struct bnxt *bp, __be16 port,

switch (tunnel_type) {
case TUNNEL_DST_PORT_ALLOC_REQ_TUNNEL_TYPE_VXLAN:
bp->vxlan_fw_dst_port_id = resp->tunnel_dst_port_id;
bp->vxlan_fw_dst_port_id =
le16_to_cpu(resp->tunnel_dst_port_id);
break;
case TUNNEL_DST_PORT_ALLOC_REQ_TUNNEL_TYPE_GENEVE:
bp->nge_fw_dst_port_id = resp->tunnel_dst_port_id;
bp->nge_fw_dst_port_id = le16_to_cpu(resp->tunnel_dst_port_id);
break;
default:
break;
Expand Down Expand Up @@ -7578,16 +7581,12 @@ static int bnxt_hwrm_pcie_qstats(struct bnxt *bp)

static void bnxt_hwrm_free_tunnel_ports(struct bnxt *bp)
{
if (bp->vxlan_port_cnt) {
if (bp->vxlan_fw_dst_port_id != INVALID_HW_RING_ID)
bnxt_hwrm_tunnel_dst_port_free(
bp, TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN);
}
bp->vxlan_port_cnt = 0;
if (bp->nge_port_cnt) {
if (bp->nge_fw_dst_port_id != INVALID_HW_RING_ID)
bnxt_hwrm_tunnel_dst_port_free(
bp, TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE);
}
bp->nge_port_cnt = 0;
}

static int bnxt_set_tpa(struct bnxt *bp, bool set_tpa)
Expand Down Expand Up @@ -9305,7 +9304,7 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
}

if (irq_re_init)
udp_tunnel_get_rx_info(bp->dev);
udp_tunnel_nic_reset_ntf(bp->dev);

set_bit(BNXT_STATE_OPEN, &bp->state);
bnxt_enable_int(bp);
Expand Down Expand Up @@ -10456,24 +10455,6 @@ static void bnxt_sp_task(struct work_struct *work)
bnxt_cfg_ntp_filters(bp);
if (test_and_clear_bit(BNXT_HWRM_EXEC_FWD_REQ_SP_EVENT, &bp->sp_event))
bnxt_hwrm_exec_fwd_req(bp);
if (test_and_clear_bit(BNXT_VXLAN_ADD_PORT_SP_EVENT, &bp->sp_event)) {
bnxt_hwrm_tunnel_dst_port_alloc(
bp, bp->vxlan_port,
TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN);
}
if (test_and_clear_bit(BNXT_VXLAN_DEL_PORT_SP_EVENT, &bp->sp_event)) {
bnxt_hwrm_tunnel_dst_port_free(
bp, TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN);
}
if (test_and_clear_bit(BNXT_GENEVE_ADD_PORT_SP_EVENT, &bp->sp_event)) {
bnxt_hwrm_tunnel_dst_port_alloc(
bp, bp->nge_port,
TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE);
}
if (test_and_clear_bit(BNXT_GENEVE_DEL_PORT_SP_EVENT, &bp->sp_event)) {
bnxt_hwrm_tunnel_dst_port_free(
bp, TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE);
}
if (test_and_clear_bit(BNXT_PERIODIC_STATS_SP_EVENT, &bp->sp_event)) {
bnxt_hwrm_port_qstats(bp);
bnxt_hwrm_port_qstats_ext(bp);
Expand Down Expand Up @@ -11070,6 +11051,9 @@ static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
timer_setup(&bp->timer, bnxt_timer, 0);
bp->current_interval = BNXT_TIMER_INTERVAL;

bp->vxlan_fw_dst_port_id = INVALID_HW_RING_ID;
bp->nge_fw_dst_port_id = INVALID_HW_RING_ID;

clear_bit(BNXT_STATE_OPEN, &bp->state);
return 0;

Expand Down Expand Up @@ -11397,84 +11381,33 @@ static void bnxt_cfg_ntp_filters(struct bnxt *bp)

#endif /* CONFIG_RFS_ACCEL */

static void bnxt_udp_tunnel_add(struct net_device *dev,
struct udp_tunnel_info *ti)
static int bnxt_udp_tunnel_sync(struct net_device *netdev, unsigned int table)
{
struct bnxt *bp = netdev_priv(dev);

if (ti->sa_family != AF_INET6 && ti->sa_family != AF_INET)
return;

if (!netif_running(dev))
return;
struct bnxt *bp = netdev_priv(netdev);
struct udp_tunnel_info ti;
unsigned int cmd;

switch (ti->type) {
case UDP_TUNNEL_TYPE_VXLAN:
if (bp->vxlan_port_cnt && bp->vxlan_port != ti->port)
return;
udp_tunnel_nic_get_port(netdev, table, 0, &ti);
if (ti.type == UDP_TUNNEL_TYPE_VXLAN)
cmd = TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_VXLAN;
else
cmd = TUNNEL_DST_PORT_FREE_REQ_TUNNEL_TYPE_GENEVE;

bp->vxlan_port_cnt++;
if (bp->vxlan_port_cnt == 1) {
bp->vxlan_port = ti->port;
set_bit(BNXT_VXLAN_ADD_PORT_SP_EVENT, &bp->sp_event);
bnxt_queue_sp_work(bp);
}
break;
case UDP_TUNNEL_TYPE_GENEVE:
if (bp->nge_port_cnt && bp->nge_port != ti->port)
return;
if (ti.port)
return bnxt_hwrm_tunnel_dst_port_alloc(bp, ti.port, cmd);

bp->nge_port_cnt++;
if (bp->nge_port_cnt == 1) {
bp->nge_port = ti->port;
set_bit(BNXT_GENEVE_ADD_PORT_SP_EVENT, &bp->sp_event);
}
break;
default:
return;
}

bnxt_queue_sp_work(bp);
return bnxt_hwrm_tunnel_dst_port_free(bp, cmd);
}

static void bnxt_udp_tunnel_del(struct net_device *dev,
struct udp_tunnel_info *ti)
{
struct bnxt *bp = netdev_priv(dev);

if (ti->sa_family != AF_INET6 && ti->sa_family != AF_INET)
return;

if (!netif_running(dev))
return;

switch (ti->type) {
case UDP_TUNNEL_TYPE_VXLAN:
if (!bp->vxlan_port_cnt || bp->vxlan_port != ti->port)
return;
bp->vxlan_port_cnt--;

if (bp->vxlan_port_cnt != 0)
return;

set_bit(BNXT_VXLAN_DEL_PORT_SP_EVENT, &bp->sp_event);
break;
case UDP_TUNNEL_TYPE_GENEVE:
if (!bp->nge_port_cnt || bp->nge_port != ti->port)
return;
bp->nge_port_cnt--;

if (bp->nge_port_cnt != 0)
return;

set_bit(BNXT_GENEVE_DEL_PORT_SP_EVENT, &bp->sp_event);
break;
default:
return;
}

bnxt_queue_sp_work(bp);
}
static const struct udp_tunnel_nic_info bnxt_udp_tunnels = {
.sync_table = bnxt_udp_tunnel_sync,
.flags = UDP_TUNNEL_NIC_INFO_MAY_SLEEP |
UDP_TUNNEL_NIC_INFO_OPEN_ONLY,
.tables = {
{ .n_entries = 1, .tunnel_types = UDP_TUNNEL_TYPE_VXLAN, },
{ .n_entries = 1, .tunnel_types = UDP_TUNNEL_TYPE_GENEVE, },
},
};

static int bnxt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
struct net_device *dev, u32 filter_mask,
Expand Down Expand Up @@ -11572,8 +11505,8 @@ static const struct net_device_ops bnxt_netdev_ops = {
#ifdef CONFIG_RFS_ACCEL
.ndo_rx_flow_steer = bnxt_rx_flow_steer,
#endif
.ndo_udp_tunnel_add = bnxt_udp_tunnel_add,
.ndo_udp_tunnel_del = bnxt_udp_tunnel_del,
.ndo_udp_tunnel_add = udp_tunnel_nic_add_port,
.ndo_udp_tunnel_del = udp_tunnel_nic_del_port,
.ndo_bpf = bnxt_xdp,
.ndo_xdp_xmit = bnxt_xdp_xmit,
.ndo_bridge_getlink = bnxt_bridge_getlink,
Expand Down Expand Up @@ -12063,6 +11996,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_GRE |
NETIF_F_GSO_UDP_TUNNEL_CSUM | NETIF_F_GSO_GRE_CSUM |
NETIF_F_GSO_IPXIP4 | NETIF_F_GSO_PARTIAL;
dev->udp_tunnel_nic_info = &bnxt_udp_tunnels;

dev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM |
NETIF_F_GSO_GRE_CSUM;
dev->vlan_features = dev->hw_features | NETIF_F_HIGHDMA;
Expand Down
12 changes: 2 additions & 10 deletions drivers/net/ethernet/broadcom/bnxt/bnxt.h
Original file line number Diff line number Diff line change
Expand Up @@ -1765,12 +1765,8 @@ struct bnxt {
((u64)(maj) << 48 | (u64)(min) << 32 | (u64)(bld) << 16 | (rsv))
#define BNXT_FW_MAJ(bp) ((bp)->fw_ver_code >> 48)

__be16 vxlan_port;
u8 vxlan_port_cnt;
__le16 vxlan_fw_dst_port_id;
__be16 nge_port;
u8 nge_port_cnt;
__le16 nge_fw_dst_port_id;
u16 vxlan_fw_dst_port_id;
u16 nge_fw_dst_port_id;
u8 port_partition_type;
u8 port_count;
u16 br_mode;
Expand All @@ -1790,16 +1786,12 @@ struct bnxt {
#define BNXT_RX_NTP_FLTR_SP_EVENT 1
#define BNXT_LINK_CHNG_SP_EVENT 2
#define BNXT_HWRM_EXEC_FWD_REQ_SP_EVENT 3
#define BNXT_VXLAN_ADD_PORT_SP_EVENT 4
#define BNXT_VXLAN_DEL_PORT_SP_EVENT 5
#define BNXT_RESET_TASK_SP_EVENT 6
#define BNXT_RST_RING_SP_EVENT 7
#define BNXT_HWRM_PF_UNLOAD_SP_EVENT 8
#define BNXT_PERIODIC_STATS_SP_EVENT 9
#define BNXT_HWRM_PORT_MODULE_SP_EVENT 10
#define BNXT_RESET_TASK_SILENT_SP_EVENT 11
#define BNXT_GENEVE_ADD_PORT_SP_EVENT 12
#define BNXT_GENEVE_DEL_PORT_SP_EVENT 13
#define BNXT_LINK_SPEED_CHNG_SP_EVENT 14
#define BNXT_FLOW_STATS_SP_EVENT 15
#define BNXT_UPDATE_PHY_SP_EVENT 16
Expand Down
3 changes: 0 additions & 3 deletions drivers/net/ethernet/intel/ixgbe/ixgbe.h
Original file line number Diff line number Diff line change
Expand Up @@ -588,11 +588,9 @@ struct ixgbe_adapter {
#define IXGBE_FLAG_FCOE_ENABLED BIT(21)
#define IXGBE_FLAG_SRIOV_CAPABLE BIT(22)
#define IXGBE_FLAG_SRIOV_ENABLED BIT(23)
#define IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE BIT(24)
#define IXGBE_FLAG_RX_HWTSTAMP_ENABLED BIT(25)
#define IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER BIT(26)
#define IXGBE_FLAG_DCB_CAPABLE BIT(27)
#define IXGBE_FLAG_GENEVE_OFFLOAD_CAPABLE BIT(28)

u32 flags2;
#define IXGBE_FLAG2_RSC_CAPABLE BIT(0)
Expand All @@ -606,7 +604,6 @@ struct ixgbe_adapter {
#define IXGBE_FLAG2_RSS_FIELD_IPV6_UDP BIT(9)
#define IXGBE_FLAG2_PTP_PPS_ENABLED BIT(10)
#define IXGBE_FLAG2_PHY_INTERRUPT BIT(11)
#define IXGBE_FLAG2_UDP_TUN_REREG_NEEDED BIT(12)
#define IXGBE_FLAG2_VLAN_PROMISC BIT(13)
#define IXGBE_FLAG2_EEE_CAPABLE BIT(14)
#define IXGBE_FLAG2_EEE_ENABLED BIT(15)
Expand Down
Loading

0 comments on commit 0ea4604

Please sign in to comment.