Skip to content

Commit

Permalink
Merge branch 'gro-fixed-id-gso-partial'
Browse files Browse the repository at this point in the history
Alexander Duyck says:

====================
GRO Fixed IPv4 ID support and GSO partial support

This patch series sets up a few different things.

First it adds support for GRO of frames with a fixed IP ID value.  This
will allow us to perform GRO for frames that go through things like an IPv6
to IPv4 header translation.

The second item we add is support for segmenting frames that are generated
this way.  Most devices only support an incrementing IP ID value, and in
the case of TCP the IP ID can be ignored in many cases since the DF bit
should be set.  So we can technically segment these frames using existing
TSO if we are willing to allow the IP ID to be mangled.  As such I have
added a matching feature for the new form of GRO/GSO called TCP IPv4 ID
mangling.  With this enabled we can assemble and disassemble a frame with
the sequence number fixed and the only ill effect will be that the IPv4 ID
will be altered which may or may not have any noticeable effect.  As such I
have defaulted the feature to disabled.

The third item this patch series adds is support for partial GSO
segmentation.  Partial GSO segmentation allows us to split a large frame
into two pieces.  The first piece will have an even multiple of MSS worth
of data and the headers before the one pointed to by csum_start will have
been updated so that they are correct for if the data payload had already
been segmented.  By doing this we can do things such as precompute the
outer header checksums for a frame to be segmented allowing us to perform
TSO on devices that don't support tunneling, or tunneling with outer header
checksums.

This patch set is based on the net-next tree, but I included "net: remove
netdevice gso_min_segs" in my tree as I assume it is likely to be applied
before this patch set will and I wanted to avoid a merge conflict.

v2: Fixed items reported by Jesse Gross
	fixed missing GSO flag in MPLS check
	adding DF check for MANGLEID
    Moved extra GSO feature checks into gso_features_check
    Rebased batches to account for "net: remove netdevice gso_min_segs"

Driver patches from the first patch set should still be compatible.  However
I do have a few changes in them so I will submit a v2 of those to Jeff
Kirsher once these patches are accepted into net-next.

Example driver patches for i40e, ixgbe, and igb:
https://patchwork.ozlabs.org/patch/608221/
https://patchwork.ozlabs.org/patch/608224/
https://patchwork.ozlabs.org/patch/608225/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Apr 14, 2016
2 parents cb68926 + f7a6272 commit edd93cd
Show file tree
Hide file tree
Showing 13 changed files with 395 additions and 54 deletions.
130 changes: 130 additions & 0 deletions Documentation/networking/segmentation-offloads.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
Segmentation Offloads in the Linux Networking Stack

Introduction
============

This document describes a set of techniques in the Linux networking stack
to take advantage of segmentation offload capabilities of various NICs.

The following technologies are described:
* TCP Segmentation Offload - TSO
* UDP Fragmentation Offload - UFO
* IPIP, SIT, GRE, and UDP Tunnel Offloads
* Generic Segmentation Offload - GSO
* Generic Receive Offload - GRO
* Partial Generic Segmentation Offload - GSO_PARTIAL

TCP Segmentation Offload
========================

TCP segmentation allows a device to segment a single frame into multiple
frames with a data payload size specified in skb_shinfo()->gso_size.
When TCP segmentation requested the bit for either SKB_GSO_TCP or
SKB_GSO_TCP6 should be set in skb_shinfo()->gso_type and
skb_shinfo()->gso_size should be set to a non-zero value.

TCP segmentation is dependent on support for the use of partial checksum
offload. For this reason TSO is normally disabled if the Tx checksum
offload for a given device is disabled.

In order to support TCP segmentation offload it is necessary to populate
the network and transport header offsets of the skbuff so that the device
drivers will be able determine the offsets of the IP or IPv6 header and the
TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should
also point to the TCP header of the packet.

For IPv4 segmentation we support one of two types in terms of the IP ID.
The default behavior is to increment the IP ID with every segment. If the
GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
ID and all segments will use the same IP ID. If a device has
NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
and we will either increment the IP ID for all frames, or leave it at a
static value based on driver preference.

UDP Fragmentation Offload
=========================

UDP fragmentation offload allows a device to fragment an oversized UDP
datagram into multiple IPv4 fragments. Many of the requirements for UDP
fragmentation offload are the same as TSO. However the IPv4 ID for
fragments should not increment as a single IPv4 datagram is fragmented.

IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
========================================================

In addition to the offloads described above it is possible for a frame to
contain additional headers such as an outer tunnel. In order to account
for such instances an additional set of segmentation offload types were
introduced including SKB_GSO_IPIP, SKB_GSO_SIT, SKB_GSO_GRE, and
SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify
cases where there are more than just 1 set of headers. For example in the
case of IPIP and SIT we should have the network and transport headers moved
from the standard list of headers to "inner" header offsets.

Currently only two levels of headers are supported. The convention is to
refer to the tunnel headers as the outer headers, while the encapsulated
data is normally referred to as the inner headers. Below is the list of
calls to access the given headers:

IPIP/SIT Tunnel:
Outer Inner
MAC skb_mac_header
Network skb_network_header skb_inner_network_header
Transport skb_transport_header

UDP/GRE Tunnel:
Outer Inner
MAC skb_mac_header skb_inner_mac_header
Network skb_network_header skb_inner_network_header
Transport skb_transport_header skb_inner_transport_header

In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the
fact that the outer header also requests to have a non-zero checksum
included in the outer header.

Finally there is SKB_GSO_REMCSUM which indicates that a given tunnel header
has requested a remote checksum offload. In this case the inner headers
will be left with a partial checksum and only the outer header checksum
will be computed.

Generic Segmentation Offload
============================

Generic segmentation offload is a pure software offload that is meant to
deal with cases where device drivers cannot perform the offloads described
above. What occurs in GSO is that a given skbuff will have its data broken
out over multiple skbuffs that have been resized to match the MSS provided
via skb_shinfo()->gso_size.

Before enabling any hardware segmentation offload a corresponding software
offload is required in GSO. Otherwise it becomes possible for a frame to
be re-routed between devices and end up being unable to be transmitted.

Generic Receive Offload
=======================

Generic receive offload is the complement to GSO. Ideally any frame
assembled by GRO should be segmented to create an identical sequence of
frames using GSO, and any sequence of frames segmented by GSO should be
able to be reassembled back to the original by GRO. The only exception to
this is IPv4 ID in the case that the DF bit is set for a given IP header.
If the value of the IPv4 ID is not sequentially incrementing it will be
altered so that it is when a frame assembled via GRO is segmented via GSO.

Partial Generic Segmentation Offload
====================================

Partial generic segmentation offload is a hybrid between TSO and GSO. What
it effectively does is take advantage of certain traits of TCP and tunnels
so that instead of having to rewrite the packet headers for each segment
only the inner-most transport header and possibly the outer-most network
header need to be updated. This allows devices that do not support tunnel
offloads or tunnel offloads with checksum to still make use of segmentation.

With the partial offload what occurs is that all headers excluding the
inner transport header are updated such that they will contain the correct
values for if the header was simply duplicated. The one exception to this
is the outer IPv4 ID field. It is up to the device drivers to guarantee
that the IPv4 ID field is incremented in the case that a given header does
not have the DF bit set.
8 changes: 8 additions & 0 deletions include/linux/netdev_features.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ enum {
NETIF_F_UFO_BIT, /* ... UDPv4 fragmentation */
NETIF_F_GSO_ROBUST_BIT, /* ... ->SKB_GSO_DODGY */
NETIF_F_TSO_ECN_BIT, /* ... TCP ECN support */
NETIF_F_TSO_MANGLEID_BIT, /* ... IPV4 ID mangling allowed */
NETIF_F_TSO6_BIT, /* ... TCPv6 segmentation */
NETIF_F_FSO_BIT, /* ... FCoE segmentation */
NETIF_F_GSO_GRE_BIT, /* ... GRE with TSO */
Expand All @@ -47,6 +48,10 @@ enum {
NETIF_F_GSO_SIT_BIT, /* ... SIT tunnel with TSO */
NETIF_F_GSO_UDP_TUNNEL_BIT, /* ... UDP TUNNEL with TSO */
NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
NETIF_F_GSO_PARTIAL_BIT, /* ... Only segment inner-most L4
* in hardware and all other
* headers in software.
*/
NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */
/**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */
NETIF_F_GSO_TUNNEL_REMCSUM_BIT,
Expand Down Expand Up @@ -120,6 +125,8 @@ enum {
#define NETIF_F_GSO_SIT __NETIF_F(GSO_SIT)
#define NETIF_F_GSO_UDP_TUNNEL __NETIF_F(GSO_UDP_TUNNEL)
#define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM)
#define NETIF_F_TSO_MANGLEID __NETIF_F(TSO_MANGLEID)
#define NETIF_F_GSO_PARTIAL __NETIF_F(GSO_PARTIAL)
#define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM)
#define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
#define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX)
Expand Down Expand Up @@ -147,6 +154,7 @@ enum {

/* List of features with software fallbacks. */
#define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | \
NETIF_F_TSO_MANGLEID | \
NETIF_F_TSO6 | NETIF_F_UFO)

/* List of IP checksum features. Note that NETIF_F_ HW_CSUM should not be
Expand Down
8 changes: 7 additions & 1 deletion include/linux/netdevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -1654,6 +1654,7 @@ struct net_device {
netdev_features_t vlan_features;
netdev_features_t hw_enc_features;
netdev_features_t mpls_features;
netdev_features_t gso_partial_features;

int ifindex;
int group;
Expand Down Expand Up @@ -2121,7 +2122,10 @@ struct napi_gro_cb {
/* Used in GRE, set in fou/gue_gro_receive */
u8 is_fou:1;

/* 6 bit hole */
/* Used to determine if flush_id can be ignored */
u8 is_atomic:1;

/* 5 bit hole */

/* used to support CHECKSUM_COMPLETE for tunneling protocols */
__wsum csum;
Expand Down Expand Up @@ -3992,6 +3996,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_UFO >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_DODGY != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_TCPV6 != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_FCOE != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_GRE != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
Expand All @@ -4000,6 +4005,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
BUILD_BUG_ON(SKB_GSO_SIT != (NETIF_F_GSO_SIT >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_PARTIAL != (NETIF_F_GSO_PARTIAL >> NETIF_F_GSO_SHIFT));
BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT));

return (features & feature) == feature;
Expand Down
27 changes: 17 additions & 10 deletions include/linux/skbuff.h
Original file line number Diff line number Diff line change
Expand Up @@ -465,23 +465,27 @@ enum {
/* This indicates the tcp segment has CWR set. */
SKB_GSO_TCP_ECN = 1 << 3,

SKB_GSO_TCPV6 = 1 << 4,
SKB_GSO_TCP_FIXEDID = 1 << 4,

SKB_GSO_FCOE = 1 << 5,
SKB_GSO_TCPV6 = 1 << 5,

SKB_GSO_GRE = 1 << 6,
SKB_GSO_FCOE = 1 << 6,

SKB_GSO_GRE_CSUM = 1 << 7,
SKB_GSO_GRE = 1 << 7,

SKB_GSO_IPIP = 1 << 8,
SKB_GSO_GRE_CSUM = 1 << 8,

SKB_GSO_SIT = 1 << 9,
SKB_GSO_IPIP = 1 << 9,

SKB_GSO_UDP_TUNNEL = 1 << 10,
SKB_GSO_SIT = 1 << 10,

SKB_GSO_UDP_TUNNEL_CSUM = 1 << 11,
SKB_GSO_UDP_TUNNEL = 1 << 11,

SKB_GSO_TUNNEL_REMCSUM = 1 << 12,
SKB_GSO_UDP_TUNNEL_CSUM = 1 << 12,

SKB_GSO_PARTIAL = 1 << 13,

SKB_GSO_TUNNEL_REMCSUM = 1 << 14,
};

#if BITS_PER_LONG > 32
Expand Down Expand Up @@ -3589,7 +3593,10 @@ static inline struct sec_path *skb_sec_path(struct sk_buff *skb)
* Keeps track of level of encapsulation of network headers.
*/
struct skb_gso_cb {
int mac_offset;
union {
int mac_offset;
int data_offset;
};
int encap_level;
__wsum csum;
__u16 csum_start;
Expand Down
67 changes: 61 additions & 6 deletions net/core/dev.c
Original file line number Diff line number Diff line change
Expand Up @@ -2711,6 +2711,19 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
return ERR_PTR(err);
}

/* Only report GSO partial support if it will enable us to
* support segmentation on this frame without needing additional
* work.
*/
if (features & NETIF_F_GSO_PARTIAL) {
netdev_features_t partial_features = NETIF_F_GSO_ROBUST;
struct net_device *dev = skb->dev;

partial_features |= dev->features & dev->gso_partial_features;
if (!skb_gso_ok(skb, features | partial_features))
features &= ~NETIF_F_GSO_PARTIAL;
}

BUILD_BUG_ON(SKB_SGO_CB_OFFSET +
sizeof(*SKB_GSO_CB(skb)) > sizeof(skb->cb));

Expand Down Expand Up @@ -2825,14 +2838,45 @@ static netdev_features_t dflt_features_check(const struct sk_buff *skb,
return vlan_features_check(skb, features);
}

static netdev_features_t gso_features_check(const struct sk_buff *skb,
struct net_device *dev,
netdev_features_t features)
{
u16 gso_segs = skb_shinfo(skb)->gso_segs;

if (gso_segs > dev->gso_max_segs)
return features & ~NETIF_F_GSO_MASK;

/* Support for GSO partial features requires software
* intervention before we can actually process the packets
* so we need to strip support for any partial features now
* and we can pull them back in after we have partially
* segmented the frame.
*/
if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL))
features &= ~dev->gso_partial_features;

/* Make sure to clear the IPv4 ID mangling feature if the
* IPv4 header has the potential to be fragmented.
*/
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
struct iphdr *iph = skb->encapsulation ?
inner_ip_hdr(skb) : ip_hdr(skb);

if (!(iph->frag_off & htons(IP_DF)))
features &= ~NETIF_F_TSO_MANGLEID;
}

return features;
}

netdev_features_t netif_skb_features(struct sk_buff *skb)
{
struct net_device *dev = skb->dev;
netdev_features_t features = dev->features;
u16 gso_segs = skb_shinfo(skb)->gso_segs;

if (gso_segs > dev->gso_max_segs)
features &= ~NETIF_F_GSO_MASK;
if (skb_is_gso(skb))
features = gso_features_check(skb, dev, features);

/* If encapsulation offload request, verify we are testing
* hardware encapsulation features instead of standard
Expand Down Expand Up @@ -4440,6 +4484,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
NAPI_GRO_CB(skb)->free = 0;
NAPI_GRO_CB(skb)->encap_mark = 0;
NAPI_GRO_CB(skb)->is_fou = 0;
NAPI_GRO_CB(skb)->is_atomic = 1;
NAPI_GRO_CB(skb)->gro_remcsum_start = 0;

/* Setup for GRO checksum validation */
Expand Down Expand Up @@ -6706,6 +6751,14 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
}
}

/* GSO partial features require GSO partial be set */
if ((features & dev->gso_partial_features) &&
!(features & NETIF_F_GSO_PARTIAL)) {
netdev_dbg(dev,
"Dropping partially supported GSO features since no GSO partial.\n");
features &= ~dev->gso_partial_features;
}

#ifdef CONFIG_NET_RX_BUSY_POLL
if (dev->netdev_ops->ndo_busy_poll)
features |= NETIF_F_BUSY_POLL;
Expand Down Expand Up @@ -6976,17 +7029,19 @@ int register_netdevice(struct net_device *dev)
dev->features |= NETIF_F_SOFT_FEATURES;
dev->wanted_features = dev->features & dev->hw_features;

if (!(dev->flags & IFF_LOOPBACK)) {
if (!(dev->flags & IFF_LOOPBACK))
dev->hw_features |= NETIF_F_NOCACHE_COPY;
}

if (dev->hw_features & NETIF_F_TSO)
dev->hw_features |= NETIF_F_TSO_MANGLEID;

/* Make NETIF_F_HIGHDMA inheritable to VLAN devices.
*/
dev->vlan_features |= NETIF_F_HIGHDMA;

/* Make NETIF_F_SG inheritable to tunnel devices.
*/
dev->hw_enc_features |= NETIF_F_SG;
dev->hw_enc_features |= NETIF_F_SG | NETIF_F_GSO_PARTIAL;

/* Make NETIF_F_SG inheritable to MPLS.
*/
Expand Down
4 changes: 4 additions & 0 deletions net/core/ethtool.c
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,16 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_UFO_BIT] = "tx-udp-fragmentation",
[NETIF_F_GSO_ROBUST_BIT] = "tx-gso-robust",
[NETIF_F_TSO_ECN_BIT] = "tx-tcp-ecn-segmentation",
[NETIF_F_TSO_MANGLEID_BIT] = "tx-tcp-mangleid-segmentation",
[NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation",
[NETIF_F_FSO_BIT] = "tx-fcoe-segmentation",
[NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation",
[NETIF_F_GSO_GRE_CSUM_BIT] = "tx-gre-csum-segmentation",
[NETIF_F_GSO_IPIP_BIT] = "tx-ipip-segmentation",
[NETIF_F_GSO_SIT_BIT] = "tx-sit-segmentation",
[NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation",
[NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT] = "tx-udp_tnl-csum-segmentation",
[NETIF_F_GSO_PARTIAL_BIT] = "tx-gso-partial",

[NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
[NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
Expand Down
Loading

0 comments on commit edd93cd

Please sign in to comment.