Skip to content

Commit

Permalink
Merge branch 'bond_hash'
Browse files Browse the repository at this point in the history
Nikolay Aleksandrov says:

====================
This is a complete remake of my old patch that modified the bonding hash
functions to use skb_flow_dissect which was suggested by Eric Dumazet.
This time around I've left the old modes although using a new hash function
again suggested by Eric, which is the same for all modes. The only
difference is the way the headers are obtained. The old modes obtain them
as before in order to address concerns about speed, but the 2 new ones use
skb_flow_dissect. The unification of the hash function allows to remove a
pointer from struct bonding and also a few extra functions that dealt with
it. Two new functions are added which take care of the hashing based on
bond->params.xmit_policy only:
bond_xmit_hash() - global function, used by XOR and 3ad modes
bond_flow_dissect() - used by bond_xmit_hash() to obtain the necessary
headers and combine them according to bond->params.xmit_policy.
Also factor out the ports extraction from skb_flow_dissect and add a new
function - skb_flow_get_ports() which can be re-used.

v2: add the flow_dissector patch and use skb_flow_get_ports in patch 02
v3: fix a bug in the flow_dissector patch that caused a different thoff
    by modifying the thoff argument in skb_flow_get_ports directly, most
    of the users already do it anyway.
    Also add the necessary export symbol for skb_flow_get_ports.
v4: integrate the thoff bug fix in patch 01
v5: disintegrate the thoff bug fix and re-base on top of Eric's fix
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Oct 3, 2013
2 parents 5080546 + 7a6afab commit 99ebe9f
Show file tree
Hide file tree
Showing 8 changed files with 137 additions and 175 deletions.
66 changes: 36 additions & 30 deletions Documentation/networking/bonding.txt
Original file line number Diff line number Diff line change
Expand Up @@ -743,21 +743,16 @@ xmit_hash_policy
protocol information to generate the hash.

Uses XOR of hardware MAC addresses and IP addresses to
generate the hash. The IPv4 formula is
generate the hash. The formula is

(((source IP XOR dest IP) AND 0xffff) XOR
( source MAC XOR destination MAC ))
modulo slave count
hash = source MAC XOR destination MAC
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
And then hash is reduced modulo slave count.

The IPv6 formula is

hash = (source ip quad 2 XOR dest IP quad 2) XOR
(source ip quad 3 XOR dest IP quad 3) XOR
(source ip quad 4 XOR dest IP quad 4)

(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
XOR (source MAC XOR destination MAC))
modulo slave count
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.

This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
Expand All @@ -779,32 +774,23 @@ xmit_hash_policy
slaves, although a single connection will not span
multiple slaves.

The formula for unfragmented IPv4 TCP and UDP packets is

((source port XOR dest port) XOR
((source IP XOR dest IP) AND 0xffff)
modulo slave count
The formula for unfragmented TCP and UDP packets is

The formula for unfragmented IPv6 TCP and UDP packets is
hash = source port, destination port (as in the header)
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
And then hash is reduced modulo slave count.

hash = (source port XOR dest port) XOR
((source ip quad 2 XOR dest IP quad 2) XOR
(source ip quad 3 XOR dest IP quad 3) XOR
(source ip quad 4 XOR dest IP quad 4))

((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
modulo slave count
If the protocol is IPv6 then the source and destination
addresses are first hashed using ipv6_addr_hash.

For fragmented TCP or UDP packets and all other IPv4 and
IPv6 protocol traffic, the source and destination port
information is omitted. For non-IP traffic, the
formula is the same as for the layer2 transmit hash
policy.

The IPv4 policy is intended to mimic the behavior of
certain switches, notably Cisco switches with PFC2 as
well as some Foundry and IBM products.

This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
Expand All @@ -815,6 +801,26 @@ xmit_hash_policy
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.

encap2+3

This policy uses the same formula as layer2+3 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.

encap3+4

This policy uses the same formula as layer3+4 but it
relies on skb_flow_dissect to obtain the header fields
which might result in the use of inner headers if an
encapsulation protocol is used. For example this will
improve the performance for tunnel users because the
packets will be distributed according to the encapsulated
flows.

The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/bonding/bond_3ad.c
Original file line number Diff line number Diff line change
Expand Up @@ -2403,7 +2403,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
goto out;
}

slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
first_ok_slave = NULL;

bond_for_each_slave(bond, slave, iter) {
Expand Down
197 changes: 68 additions & 129 deletions drivers/net/bonding/bond_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@
#include <net/netns/generic.h>
#include <net/pkt_sched.h>
#include <linux/rculist.h>
#include <net/flow_keys.h>
#include "bonding.h"
#include "bond_3ad.h"
#include "bond_alb.h"
Expand Down Expand Up @@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
module_param(xmit_hash_policy, charp, 0);
MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
"0 for layer 2 (default), 1 for layer 3+4, "
"2 for layer 2+3");
"2 for layer 2+3, 3 for encap layer 2+3, "
"4 for encap layer 3+4");
module_param(arp_interval, int, 0);
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
module_param_array(arp_ip_target, charp, NULL, 0);
Expand Down Expand Up @@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
{ "layer2", BOND_XMIT_POLICY_LAYER2},
{ "layer3+4", BOND_XMIT_POLICY_LAYER34},
{ "layer2+3", BOND_XMIT_POLICY_LAYER23},
{ "encap2+3", BOND_XMIT_POLICY_ENCAP23},
{ "encap3+4", BOND_XMIT_POLICY_ENCAP34},
{ NULL, -1},
};

Expand Down Expand Up @@ -3035,99 +3039,85 @@ static struct notifier_block bond_netdev_notifier = {

/*---------------------------- Hashing Policies -----------------------------*/

/*
* Hash for the output device based upon layer 2 data
*/
static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
/* L2 hash helper */
static inline u32 bond_eth_hash(struct sk_buff *skb)
{
struct ethhdr *data = (struct ethhdr *)skb->data;

if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
return (data->h_dest[5] ^ data->h_source[5]) % count;
return data->h_dest[5] ^ data->h_source[5];

return 0;
}

/*
* Hash for the output device based upon layer 2 and layer 3 data. If
* the packet is not IP, fall back on bond_xmit_hash_policy_l2()
*/
static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
/* Extract the appropriate headers based on bond's xmit policy */
static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
struct flow_keys *fk)
{
const struct ethhdr *data;
const struct ipv6hdr *iph6;
const struct iphdr *iph;
const struct ipv6hdr *ipv6h;
u32 v6hash;
const __be32 *s, *d;
int noff, proto = -1;

if (skb->protocol == htons(ETH_P_IP) &&
pskb_network_may_pull(skb, sizeof(*iph))) {
if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
return skb_flow_dissect(skb, fk);

fk->ports = 0;
noff = skb_network_offset(skb);
if (skb->protocol == htons(ETH_P_IP)) {
if (!pskb_may_pull(skb, noff + sizeof(*iph)))
return false;
iph = ip_hdr(skb);
data = (struct ethhdr *)skb->data;
return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
(data->h_dest[5] ^ data->h_source[5])) % count;
} else if (skb->protocol == htons(ETH_P_IPV6) &&
pskb_network_may_pull(skb, sizeof(*ipv6h))) {
ipv6h = ipv6_hdr(skb);
data = (struct ethhdr *)skb->data;
s = &ipv6h->saddr.s6_addr32[0];
d = &ipv6h->daddr.s6_addr32[0];
v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
}

return bond_xmit_hash_policy_l2(skb, count);
fk->src = iph->saddr;
fk->dst = iph->daddr;
noff += iph->ihl << 2;
if (!ip_is_fragment(iph))
proto = iph->protocol;
} else if (skb->protocol == htons(ETH_P_IPV6)) {
if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
return false;
iph6 = ipv6_hdr(skb);
fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
noff += sizeof(*iph6);
proto = iph6->nexthdr;
} else {
return false;
}
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
fk->ports = skb_flow_get_ports(skb, noff, proto);

return true;
}

/*
* Hash for the output device based upon layer 3 and layer 4 data. If
* the packet is a frag or not TCP or UDP, just use layer 3 data. If it is
* altogether not IP, fall back on bond_xmit_hash_policy_l2()
/**
* bond_xmit_hash - generate a hash value based on the xmit policy
* @bond: bonding device
* @skb: buffer to use for headers
* @count: modulo value
*
* This function will extract the necessary headers from the skb buffer and use
* them to generate a hash based on the xmit_policy set in the bonding device
* which will be reduced modulo count before returning.
*/
static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
{
u32 layer4_xor = 0;
const struct iphdr *iph;
const struct ipv6hdr *ipv6h;
const __be32 *s, *d;
const __be16 *l4 = NULL;
__be16 _l4[2];
int noff = skb_network_offset(skb);
int poff;

if (skb->protocol == htons(ETH_P_IP) &&
pskb_may_pull(skb, noff + sizeof(*iph))) {
iph = ip_hdr(skb);
poff = proto_ports_offset(iph->protocol);
struct flow_keys flow;
u32 hash;

if (!ip_is_fragment(iph) && poff >= 0) {
l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
sizeof(_l4), &_l4);
if (l4)
layer4_xor = ntohs(l4[0] ^ l4[1]);
}
return (layer4_xor ^
((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
} else if (skb->protocol == htons(ETH_P_IPV6) &&
pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
ipv6h = ipv6_hdr(skb);
poff = proto_ports_offset(ipv6h->nexthdr);
if (poff >= 0) {
l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
sizeof(_l4), &_l4);
if (l4)
layer4_xor = ntohs(l4[0] ^ l4[1]);
}
s = &ipv6h->saddr.s6_addr32[0];
d = &ipv6h->daddr.s6_addr32[0];
layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
(layer4_xor >> 8);
return layer4_xor % count;
}
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
!bond_flow_dissect(bond, skb, &flow))
return bond_eth_hash(skb) % count;

if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
hash = bond_eth_hash(skb);
else
hash = (__force u32)flow.ports;
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);

return bond_xmit_hash_policy_l2(skb, count);
return hash % count;
}

/*-------------------------- Device entry points ----------------------------*/
Expand Down Expand Up @@ -3721,17 +3711,15 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
return NETDEV_TX_OK;
}

/*
* In bond_xmit_xor() , we determine the output device by using a pre-
/* In bond_xmit_xor() , we determine the output device by using a pre-
* determined xmit_hash_policy(), If the selected device is not enabled,
* find the next active slave.
*/
static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);

bond_xmit_slave_id(bond, skb,
bond->xmit_hash_policy(skb, bond->slave_cnt));
bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));

return NETDEV_TX_OK;
}
Expand Down Expand Up @@ -3768,22 +3756,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)

/*------------------------- Device initialization ---------------------------*/

static void bond_set_xmit_hash_policy(struct bonding *bond)
{
switch (bond->params.xmit_policy) {
case BOND_XMIT_POLICY_LAYER23:
bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
break;
case BOND_XMIT_POLICY_LAYER34:
bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
break;
case BOND_XMIT_POLICY_LAYER2:
default:
bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
break;
}
}

/*
* Lookup the slave that corresponds to a qid
*/
Expand Down Expand Up @@ -3894,38 +3866,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
return ret;
}

/*
* set bond mode specific net device operations
*/
void bond_set_mode_ops(struct bonding *bond, int mode)
{
struct net_device *bond_dev = bond->dev;

switch (mode) {
case BOND_MODE_ROUNDROBIN:
break;
case BOND_MODE_ACTIVEBACKUP:
break;
case BOND_MODE_XOR:
bond_set_xmit_hash_policy(bond);
break;
case BOND_MODE_BROADCAST:
break;
case BOND_MODE_8023AD:
bond_set_xmit_hash_policy(bond);
break;
case BOND_MODE_ALB:
/* FALLTHRU */
case BOND_MODE_TLB:
break;
default:
/* Should never happen, mode already checked */
pr_err("%s: Error: Unknown bonding mode %d\n",
bond_dev->name, mode);
break;
}
}

static int bond_ethtool_get_settings(struct net_device *bond_dev,
struct ethtool_cmd *ecmd)
{
Expand Down Expand Up @@ -4027,7 +3967,6 @@ static void bond_setup(struct net_device *bond_dev)
ether_setup(bond_dev);
bond_dev->netdev_ops = &bond_netdev_ops;
bond_dev->ethtool_ops = &bond_ethtool_ops;
bond_set_mode_ops(bond, bond->params.mode);

bond_dev->destructor = bond_destructor;

Expand Down
Loading

0 comments on commit 99ebe9f

Please sign in to comment.