Skip to content

Commit

Permalink
Merge branch 'bridge-per-vlan-dst_metadata-support'
Browse files Browse the repository at this point in the history
Roopa Prabhu says:

====================
bridge: per vlan dst_metadata support

High level summary:
lwt and dst_metadata have enabled vxlan l3 deployments
to use a single vxlan netdev for multiple vnis eliminating the scalability
problem with using a single vxlan netdev per vni. This series tries to
do the same for vxlan netdevs in pure l2 bridged networks.
Use-case/deployment and details are below.

Deployment scerario details:
As we know VXLAN is used to build layer 2 virtual networks across the
underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
or a vswitch in the hypervisor. This patch series mainly
focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
along with vlan id is used to identify layer 2 segments in a vxlan
overlay network. Vxlan bridging is the function provided by Vteps to terminate
vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 7348.
To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
the original Layer 2 packet if there is one before encapsulating the packet
into the VXLAN format to transmit it through the underlay network. The remote
VTEP devices have information about the VLAN in which the packet will be
placed based on their own VLAN-to-VXLAN VNI mapping configurations.

Existing solution:
Without this patch series one can deploy such a vtep configuration by
adding the local ports and vxlan netdevs into a vlan filtering bridge.
The local ports are configured as trunk ports carrying all vlans.
A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
The vxlan netdev only receives traffic corresponding to the vlan it is mapped
to. This configuration maps traffic belonging to a vlan to the corresponding
vxlan segment.

          -----------------------------------
         |              bridge               |
         |                                   |
          -----------------------------------
            |100,200       |100 (pvid)    |200 (pvid)
            |              |              |
           swp1          vxlan1000      vxlan2000

This provides the required vxlan bridging function but poses a
scalability problem with using a separate vxlan netdev for each vni.

Solution in this patch series:
The Goal is to use a single vxlan device to carry all vnis similar
to the vxlan collect metadata mode but additionally allowing the bridge
and vxlan driver to carry all the forwarding information and also learn.
This implementation uses the existing dst_metadata infrastructure to map
vlan to a tunnel id.
- vxlan driver changes:
    - enable collect metadata mode to be used with learning,
      replication and fdb
    - A single fdb table hashed by (mac, vni)
    - rx path already has the vni
    - tx path expects a vni in the packet with dst_metadata and relies
      on learnt or static forwarding information table to forward the packet

- Bridge driver changes: per vlan dst_metadata support:
    - Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
      kept the api generic for any tunnel info
    - Uapi to configure/unconfigure/dump per vlan tunnel data
    - new bridge port flag to turn this feature on/off. off by default
    - ingress hook:
        - if port is a tunnel port, use tunnel info in
          attached dst_metadata to map it to a local vlan
    - egress hook:
        - if port is a tunnel port, use tunnel info attached to vlan
          to set dst_metadata on the skb

Other approaches tried and vetoed:
- tc vlan push/pop and tunnel metadata dst:
    - though tc can be used to do part of this, these patches address a deployment
      case where bridge driver vlan filtering and forwarding information
      database along with vxlan driver forwarding information table and learning
      are required.
- making vxlan driver understand vlan-vni mapping:
    - I had a series almost ready with this one but soon realized
      it duplicated a lot of vlan handling code in the vxlan driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Feb 3, 2017
2 parents 5a0fd98 + 11538d0 commit 3b19860
Show file tree
Hide file tree
Showing 15 changed files with 863 additions and 121 deletions.
196 changes: 125 additions & 71 deletions drivers/net/vxlan.c

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions include/linux/if_bridge.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ struct br_ip_list {
#define BR_PROXYARP_WIFI BIT(10)
#define BR_MCAST_FLOOD BIT(11)
#define BR_MULTICAST_TO_UNICAST BIT(12)
#define BR_VLAN_TUNNEL BIT(13)

#define BR_DEFAULT_AGEING_TIME (300 * HZ)

Expand Down
1 change: 1 addition & 0 deletions include/net/ip_tunnels.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ struct ip_tunnel_key {
/* Flags for ip_tunnel_info mode. */
#define IP_TUNNEL_INFO_TX 0x01 /* represents tx tunnel parameters */
#define IP_TUNNEL_INFO_IPV6 0x02 /* key contains IPv6 addresses */
#define IP_TUNNEL_INFO_BRIDGE 0x04 /* represents a bridged tunnel id */

/* Maximum tunnel options length. */
#define IP_TUNNEL_OPTS_MAX \
Expand Down
11 changes: 11 additions & 0 deletions include/uapi/linux/if_bridge.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ enum {
IFLA_BRIDGE_FLAGS,
IFLA_BRIDGE_MODE,
IFLA_BRIDGE_VLAN_INFO,
IFLA_BRIDGE_VLAN_TUNNEL_INFO,
__IFLA_BRIDGE_MAX,
};
#define IFLA_BRIDGE_MAX (__IFLA_BRIDGE_MAX - 1)
Expand All @@ -134,6 +135,16 @@ struct bridge_vlan_info {
__u16 vid;
};

enum {
IFLA_BRIDGE_VLAN_TUNNEL_UNSPEC,
IFLA_BRIDGE_VLAN_TUNNEL_ID,
IFLA_BRIDGE_VLAN_TUNNEL_VID,
IFLA_BRIDGE_VLAN_TUNNEL_FLAGS,
__IFLA_BRIDGE_VLAN_TUNNEL_MAX,
};

#define IFLA_BRIDGE_VLAN_TUNNEL_MAX (__IFLA_BRIDGE_VLAN_TUNNEL_MAX - 1)

struct bridge_vlan_xstats {
__u64 rx_bytes;
__u64 rx_packets;
Expand Down
1 change: 1 addition & 0 deletions include/uapi/linux/if_link.h
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@ enum {
IFLA_BRPORT_PAD,
IFLA_BRPORT_MCAST_FLOOD,
IFLA_BRPORT_MCAST_TO_UCAST,
IFLA_BRPORT_VLAN_TUNNEL,
__IFLA_BRPORT_MAX
};
#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
Expand Down
1 change: 1 addition & 0 deletions include/uapi/linux/neighbour.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ enum {
NDA_IFINDEX,
NDA_MASTER,
NDA_LINK_NETNSID,
NDA_SRC_VNI,
__NDA_MAX
};

Expand Down
5 changes: 3 additions & 2 deletions net/bridge/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ obj-$(CONFIG_BRIDGE) += bridge.o

bridge-y := br.o br_device.o br_fdb.o br_forward.o br_if.o br_input.o \
br_ioctl.o br_stp.o br_stp_bpdu.o \
br_stp_if.o br_stp_timer.o br_netlink.o
br_stp_if.o br_stp_timer.o br_netlink.o \
br_netlink_tunnel.o

bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o

Expand All @@ -18,7 +19,7 @@ obj-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o

bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o br_mdb.o

bridge-$(CONFIG_BRIDGE_VLAN_FILTERING) += br_vlan.o
bridge-$(CONFIG_BRIDGE_VLAN_FILTERING) += br_vlan.o br_vlan_tunnel.o

bridge-$(CONFIG_NET_SWITCHDEV) += br_switchdev.o

Expand Down
2 changes: 1 addition & 1 deletion net/bridge/br_forward.c
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ static void __br_forward(const struct net_bridge_port *to,
int br_hook;

vg = nbp_vlan_group_rcu(to);
skb = br_handle_vlan(to->br, vg, skb);
skb = br_handle_vlan(to->br, to, vg, skb);
if (!skb)
return;

Expand Down
8 changes: 7 additions & 1 deletion net/bridge/br_input.c
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <linux/export.h>
#include <linux/rculist.h>
#include "br_private.h"
#include "br_private_tunnel.h"

/* Hook for brouter */
br_should_route_hook_t __rcu *br_should_route_hook __read_mostly;
Expand Down Expand Up @@ -57,7 +58,7 @@ static int br_pass_frame_up(struct sk_buff *skb)

indev = skb->dev;
skb->dev = brdev;
skb = br_handle_vlan(br, vg, skb);
skb = br_handle_vlan(br, NULL, vg, skb);
if (!skb)
return NET_RX_DROP;
/* update the multicast stats if the packet is IGMP/MLD */
Expand Down Expand Up @@ -261,6 +262,11 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
return RX_HANDLER_CONSUMED;

p = br_port_get_rcu(skb->dev);
if (p->flags & BR_VLAN_TUNNEL) {
if (br_handle_ingress_vlan_tunnel(skb, p,
nbp_vlan_group_rcu(p)))
goto drop;
}

if (unlikely(is_link_local_ether_addr(dest))) {
u16 fwd_mask = p->br->group_fwd_mask_required;
Expand Down
140 changes: 95 additions & 45 deletions net/bridge/br_netlink.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

#include "br_private.h"
#include "br_private_stp.h"
#include "br_private_tunnel.h"

static int __get_num_vlan_infos(struct net_bridge_vlan_group *vg,
u32 filter_mask)
Expand Down Expand Up @@ -95,9 +96,10 @@ static size_t br_get_link_af_size_filtered(const struct net_device *dev,
u32 filter_mask)
{
struct net_bridge_vlan_group *vg = NULL;
struct net_bridge_port *p;
struct net_bridge_port *p = NULL;
struct net_bridge *br;
int num_vlan_infos;
size_t vinfo_sz = 0;

rcu_read_lock();
if (br_port_exists(dev)) {
Expand All @@ -110,8 +112,13 @@ static size_t br_get_link_af_size_filtered(const struct net_device *dev,
num_vlan_infos = br_get_num_vlan_infos(vg, filter_mask);
rcu_read_unlock();

if (p && (p->flags & BR_VLAN_TUNNEL))
vinfo_sz += br_get_vlan_tunnel_info_size(vg);

/* Each VLAN is returned in bridge_vlan_info along with flags */
return num_vlan_infos * nla_total_size(sizeof(struct bridge_vlan_info));
vinfo_sz += num_vlan_infos * nla_total_size(sizeof(struct bridge_vlan_info));

return vinfo_sz;
}

static inline size_t br_port_info_size(void)
Expand All @@ -128,6 +135,7 @@ static inline size_t br_port_info_size(void)
+ nla_total_size(1) /* IFLA_BRPORT_UNICAST_FLOOD */
+ nla_total_size(1) /* IFLA_BRPORT_PROXYARP */
+ nla_total_size(1) /* IFLA_BRPORT_PROXYARP_WIFI */
+ nla_total_size(1) /* IFLA_BRPORT_VLAN_TUNNEL */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_ROOT_ID */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* IFLA_BRPORT_BRIDGE_ID */
+ nla_total_size(sizeof(u16)) /* IFLA_BRPORT_DESIGNATED_PORT */
Expand Down Expand Up @@ -194,7 +202,9 @@ static int br_port_fill_attrs(struct sk_buff *skb,
nla_put_u16(skb, IFLA_BRPORT_NO, p->port_no) ||
nla_put_u8(skb, IFLA_BRPORT_TOPOLOGY_CHANGE_ACK,
p->topology_change_ack) ||
nla_put_u8(skb, IFLA_BRPORT_CONFIG_PENDING, p->config_pending))
nla_put_u8(skb, IFLA_BRPORT_CONFIG_PENDING, p->config_pending) ||
nla_put_u8(skb, IFLA_BRPORT_VLAN_TUNNEL, !!(p->flags &
BR_VLAN_TUNNEL)))
return -EMSGSIZE;

timerval = br_timer_value(&p->message_age_timer);
Expand Down Expand Up @@ -417,6 +427,9 @@ static int br_fill_ifinfo(struct sk_buff *skb,
err = br_fill_ifvlaninfo_compressed(skb, vg);
else
err = br_fill_ifvlaninfo(skb, vg);

if (port && (port->flags & BR_VLAN_TUNNEL))
err = br_fill_vlan_tunnel_info(skb, vg);
rcu_read_unlock();
if (err)
goto nla_put_failure;
Expand Down Expand Up @@ -517,60 +530,91 @@ static int br_vlan_info(struct net_bridge *br, struct net_bridge_port *p,
return err;
}

static int br_process_vlan_info(struct net_bridge *br,
struct net_bridge_port *p, int cmd,
struct bridge_vlan_info *vinfo_curr,
struct bridge_vlan_info **vinfo_last)
{
if (!vinfo_curr->vid || vinfo_curr->vid >= VLAN_VID_MASK)
return -EINVAL;

if (vinfo_curr->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
/* check if we are already processing a range */
if (*vinfo_last)
return -EINVAL;
*vinfo_last = vinfo_curr;
/* don't allow range of pvids */
if ((*vinfo_last)->flags & BRIDGE_VLAN_INFO_PVID)
return -EINVAL;
return 0;
}

if (*vinfo_last) {
struct bridge_vlan_info tmp_vinfo;
int v, err;

if (!(vinfo_curr->flags & BRIDGE_VLAN_INFO_RANGE_END))
return -EINVAL;

if (vinfo_curr->vid <= (*vinfo_last)->vid)
return -EINVAL;

memcpy(&tmp_vinfo, *vinfo_last,
sizeof(struct bridge_vlan_info));
for (v = (*vinfo_last)->vid; v <= vinfo_curr->vid; v++) {
tmp_vinfo.vid = v;
err = br_vlan_info(br, p, cmd, &tmp_vinfo);
if (err)
break;
}
*vinfo_last = NULL;

return 0;
}

return br_vlan_info(br, p, cmd, vinfo_curr);
}

static int br_afspec(struct net_bridge *br,
struct net_bridge_port *p,
struct nlattr *af_spec,
int cmd)
{
struct bridge_vlan_info *vinfo_start = NULL;
struct bridge_vlan_info *vinfo = NULL;
struct bridge_vlan_info *vinfo_curr = NULL;
struct bridge_vlan_info *vinfo_last = NULL;
struct nlattr *attr;
int err = 0;
int rem;
struct vtunnel_info tinfo_last = {};
struct vtunnel_info tinfo_curr = {};
int err = 0, rem;

nla_for_each_nested(attr, af_spec, rem) {
if (nla_type(attr) != IFLA_BRIDGE_VLAN_INFO)
continue;
if (nla_len(attr) != sizeof(struct bridge_vlan_info))
return -EINVAL;
vinfo = nla_data(attr);
if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
return -EINVAL;
if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
if (vinfo_start)
err = 0;
switch (nla_type(attr)) {
case IFLA_BRIDGE_VLAN_TUNNEL_INFO:
if (!(p->flags & BR_VLAN_TUNNEL))
return -EINVAL;
vinfo_start = vinfo;
/* don't allow range of pvids */
if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
err = br_parse_vlan_tunnel_info(attr, &tinfo_curr);
if (err)
return err;
err = br_process_vlan_tunnel_info(br, p, cmd,
&tinfo_curr,
&tinfo_last);
if (err)
return err;
break;
case IFLA_BRIDGE_VLAN_INFO:
if (nla_len(attr) != sizeof(struct bridge_vlan_info))
return -EINVAL;
continue;
vinfo_curr = nla_data(attr);
err = br_process_vlan_info(br, p, cmd, vinfo_curr,
&vinfo_last);
if (err)
return err;
break;
}

if (vinfo_start) {
struct bridge_vlan_info tmp_vinfo;
int v;

if (!(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
return -EINVAL;

if (vinfo->vid <= vinfo_start->vid)
return -EINVAL;

memcpy(&tmp_vinfo, vinfo_start,
sizeof(struct bridge_vlan_info));

for (v = vinfo_start->vid; v <= vinfo->vid; v++) {
tmp_vinfo.vid = v;
err = br_vlan_info(br, p, cmd, &tmp_vinfo);
if (err)
break;
}
vinfo_start = NULL;
} else {
err = br_vlan_info(br, p, cmd, vinfo);
}
if (err)
break;
return err;
}

return err;
Expand Down Expand Up @@ -630,8 +674,9 @@ static void br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
/* Process bridge protocol info on port */
static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
{
int err;
unsigned long old_flags = p->flags;
bool br_vlan_tunnel_old = false;
int err;

br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
Expand All @@ -644,6 +689,11 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);

br_vlan_tunnel_old = (p->flags & BR_VLAN_TUNNEL) ? true : false;
br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
if (br_vlan_tunnel_old && !(p->flags & BR_VLAN_TUNNEL))
nbp_vlan_tunnel_info_flush(p);

if (tb[IFLA_BRPORT_COST]) {
err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));
if (err)
Expand Down
Loading

0 comments on commit 3b19860

Please sign in to comment.