Skip to content

Commit

Permalink
Merge branch 'Identifier-Locator-Addressing'
Browse files Browse the repository at this point in the history
Tom Herbert says:

====================
net: Identifier Locator Addressing - Part I

This patch set provides rudimentary support for Identifier Locator
Addressing or ILA. The basic concept of ILA is that we split an IPv6
address into a 64 bit locator and 64 bit identifier. The identifier is
the identity of an entity in communication ("who"), and the locator
expresses the location of the entity ("where"). Applications
use externally visible address that contains the identifier.
When a packet is actually sent, a translation is done that
overwrites the first 64 bits of the address with a locator.
The packet can then be forwarded over the network to the host where
the addressed entity is located. At the receiver, the reverse
translation is done so the that the application sees the original,
untranslated address. Presumably an external control plane will
provide identifier->locator mappings.

v2:
  - Fix compilation erros when LWT not configured
  - Consolidate ILA into a single ila.c

v3:
  - Change pseudohdr argument od inet_proto_csum_replace functions to
    be a bool

v4:
  - In ila_build_state check locator being in netlink params before
    allocating tunnel state

The data path for ILA is a simple NAT translation that only operates
on the upper 64 bits of a destination address in IPv6 packets. The
basic process is:

   1) Lookup 64 bit identifier (lower 64 bits of destination)
   2) If a match is found
      a) Overwrite locator (upper 64 bits of destination) with
         the new locator
      b) Adjust any checksum that has destination address included in
         pseudo header
   3) Send or receive packet

ILA is a means to implement tunnels or network virtualization without
encapsulation. Since there is no encapsulation involved, we assume that
stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
just works. Also, since we're minimally changing the packet many of
the worries about encapsulation (MTU, checksum, fragmentation) are
not relevant. The downside is that, ILA is not extensible like other
encapsulations (GUE for instance) so it might not be appropriate for
all use cases. Also, this only makes sense to do in IPv6!

A key aspect of ILA is performance. The intent is that ILA would be
used in data centers in virtualizing tasks or jobs. In the fullest
incarnation all intra data center communications might be targeted to
virtual ILA addresses. This is basically adding a new virtualization
capability to the existing services in a datacenter, so there is a
strong expectation is that this does not degrade performance for
existing applications.

Performance seems to be dependent on how ILA is hooked into kernel.
ILA can be implemented under some different models:

  - Mechanically it is a form a stateless DNAT
  - It can be thought of as a type of (source) routing
  - As a functional replacement of encapsulation

In this patch set we hook into the data path using Light Weight
Tunnels (LWT) infrastructure. As part of that, we add support in LWT
to redirect dst input. iproute will be modified to take a new ila encap
type. ILA can be configured like:

ip route add 3333:0:0:1:5555:0:2:0/128 \
   encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0

ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0

ip route add table local local 2001:0:0:1:5555:0:1:0/128
   encap ila 3333:0:0:1 dev lo

So sending to destination 3333:0:0:1:5555:0:2:0 will have destination
of 2001:0:0:2:5555:0:2:0 on the wire.

Performance results are below. With ILA we see about a 10% drop in
pps compared to non-ILA. Much of this drop can be attributed to the
loss of early demux on input (translation occurs after it is attempted).
We will address this in the next patch set. Also, IPvlan input path
does not work with ILA since the routing is bypassed-- this will
be addressed in a future patch.

Performance testing:

Performing netperf TCP_RR with 200 clients:

Non-ILA baseline
  84.92% CPU utilization
  1861922.9 tps
  93/163/330 50/90/99% latencies

ILA single destination
  83.16% CPU utilization
  1679683.4 tps
  105/180/332 50/90/99% latencies

References:

Slides from netconf:
http://vger.kernel.org/netconf2015Herbert-ILA.pdf

Slides from presentation at IETF:
https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf

I-D:
https://tools.ietf.org/html/draft-herbert-nvo3-ila-00
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Aug 18, 2015
2 parents f376d4a + 65d7ab8 commit 0b233dc
Show file tree
Hide file tree
Showing 27 changed files with 403 additions and 39 deletions.
8 changes: 5 additions & 3 deletions include/net/checksum.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,14 +140,16 @@ static inline void csum_replace2(__sum16 *sum, __be16 old, __be16 new)

struct sk_buff;
void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
__be32 from, __be32 to, int pseudohdr);
__be32 from, __be32 to, bool pseudohdr);
void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
const __be32 *from, const __be32 *to,
int pseudohdr);
bool pseudohdr);
void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
__wsum diff, bool pseudohdr);

static inline void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
__be16 from, __be16 to,
int pseudohdr)
bool pseudohdr)
{
inet_proto_csum_replace4(sum, skb, (__force __be32)from,
(__force __be32)to, pseudohdr);
Expand Down
30 changes: 29 additions & 1 deletion include/net/lwtunnel.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@
#define LWTUNNEL_HASH_SIZE (1 << LWTUNNEL_HASH_BITS)

/* lw tunnel state flags */
#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
#define LWTUNNEL_STATE_OUTPUT_REDIRECT BIT(0)
#define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1)

struct lwtunnel_state {
__u16 type;
__u16 flags;
atomic_t refcnt;
int (*orig_output)(struct sock *sk, struct sk_buff *skb);
int (*orig_input)(struct sk_buff *);
int len;
__u8 data[0];
};
Expand All @@ -25,6 +28,7 @@ struct lwtunnel_encap_ops {
int (*build_state)(struct net_device *dev, struct nlattr *encap,
struct lwtunnel_state **ts);
int (*output)(struct sock *sk, struct sk_buff *skb);
int (*input)(struct sk_buff *skb);
int (*fill_encap)(struct sk_buff *skb,
struct lwtunnel_state *lwtstate);
int (*get_encap_size)(struct lwtunnel_state *lwtstate);
Expand Down Expand Up @@ -58,6 +62,13 @@ static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate)
return false;
}

static inline bool lwtunnel_input_redirect(struct lwtunnel_state *lwtstate)
{
if (lwtstate && (lwtstate->flags & LWTUNNEL_STATE_INPUT_REDIRECT))
return true;

return false;
}
int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
unsigned int num);
int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
Expand All @@ -72,6 +83,8 @@ struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
int lwtunnel_output(struct sock *sk, struct sk_buff *skb);
int lwtunnel_output6(struct sock *sk, struct sk_buff *skb);
int lwtunnel_input(struct sk_buff *skb);
int lwtunnel_input6(struct sk_buff *skb);

#else

Expand All @@ -90,6 +103,11 @@ static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate)
return false;
}

static inline bool lwtunnel_input_redirect(struct lwtunnel_state *lwtstate)
{
return false;
}

static inline int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
unsigned int num)
{
Expand Down Expand Up @@ -142,6 +160,16 @@ static inline int lwtunnel_output6(struct sock *sk, struct sk_buff *skb)
return -EOPNOTSUPP;
}

static inline int lwtunnel_input(struct sk_buff *skb)
{
return -EOPNOTSUPP;
}

static inline int lwtunnel_input6(struct sk_buff *skb)
{
return -EOPNOTSUPP;
}

#endif

#endif /* __NET_LWTUNNEL_H */
15 changes: 15 additions & 0 deletions include/uapi/linux/ila.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/* ila.h - ILA Interface */

#ifndef _UAPI_LINUX_ILA_H
#define _UAPI_LINUX_ILA_H

enum {
ILA_ATTR_UNSPEC,
ILA_ATTR_LOCATOR, /* u64 */

__ILA_ATTR_MAX,
};

#define ILA_ATTR_MAX (__ILA_ATTR_MAX - 1)

#endif /* _UAPI_LINUX_ILA_H */
1 change: 1 addition & 0 deletions include/uapi/linux/lwtunnel.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_NONE,
LWTUNNEL_ENCAP_MPLS,
LWTUNNEL_ENCAP_IP,
LWTUNNEL_ENCAP_ILA,
__LWTUNNEL_ENCAP_MAX,
};

Expand Down
2 changes: 1 addition & 1 deletion net/core/filter.c
Original file line number Diff line number Diff line change
Expand Up @@ -1349,7 +1349,7 @@ const struct bpf_func_proto bpf_l3_csum_replace_proto = {
static u64 bpf_l4_csum_replace(u64 r1, u64 r2, u64 from, u64 to, u64 flags)
{
struct sk_buff *skb = (struct sk_buff *) (long) r1;
u32 is_pseudo = BPF_IS_PSEUDO_HEADER(flags);
bool is_pseudo = !!BPF_IS_PSEUDO_HEADER(flags);
int offset = (int) r2;
__sum16 sum, *ptr;

Expand Down
55 changes: 55 additions & 0 deletions net/core/lwtunnel.c
Original file line number Diff line number Diff line change
Expand Up @@ -241,3 +241,58 @@ int lwtunnel_output(struct sock *sk, struct sk_buff *skb)
return __lwtunnel_output(sk, skb, lwtstate);
}
EXPORT_SYMBOL(lwtunnel_output);

int __lwtunnel_input(struct sk_buff *skb,
struct lwtunnel_state *lwtstate)
{
const struct lwtunnel_encap_ops *ops;
int ret = -EINVAL;

if (!lwtstate)
goto drop;

if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
lwtstate->type > LWTUNNEL_ENCAP_MAX)
return 0;

ret = -EOPNOTSUPP;
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
if (likely(ops && ops->input))
ret = ops->input(skb);
rcu_read_unlock();

if (ret == -EOPNOTSUPP)
goto drop;

return ret;

drop:
kfree_skb(skb);

return ret;
}

int lwtunnel_input6(struct sk_buff *skb)
{
struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
struct lwtunnel_state *lwtstate = NULL;

if (rt)
lwtstate = rt->rt6i_lwtstate;

return __lwtunnel_input(skb, lwtstate);
}
EXPORT_SYMBOL(lwtunnel_input6);

int lwtunnel_input(struct sk_buff *skb)
{
struct rtable *rt = (struct rtable *)skb_dst(skb);
struct lwtunnel_state *lwtstate = NULL;

if (rt)
lwtstate = rt->rt_lwtstate;

return __lwtunnel_input(skb, lwtstate);
}
EXPORT_SYMBOL(lwtunnel_input);
17 changes: 15 additions & 2 deletions net/core/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ int in6_pton(const char *src, int srclen,
EXPORT_SYMBOL(in6_pton);

void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb,
__be32 from, __be32 to, int pseudohdr)
__be32 from, __be32 to, bool pseudohdr)
{
if (skb->ip_summed != CHECKSUM_PARTIAL) {
csum_replace4(sum, from, to);
Expand All @@ -318,7 +318,7 @@ EXPORT_SYMBOL(inet_proto_csum_replace4);

void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
const __be32 *from, const __be32 *to,
int pseudohdr)
bool pseudohdr)
{
__be32 diff[] = {
~from[0], ~from[1], ~from[2], ~from[3],
Expand All @@ -336,6 +336,19 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
}
EXPORT_SYMBOL(inet_proto_csum_replace16);

void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
__wsum diff, bool pseudohdr)
{
if (skb->ip_summed != CHECKSUM_PARTIAL) {
*sum = csum_fold(csum_add(diff, ~csum_unfold(*sum)));
if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
skb->csum = ~csum_add(diff, ~skb->csum);
} else if (pseudohdr) {
*sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
}
}
EXPORT_SYMBOL(inet_proto_csum_replace_by_diff);

struct __net_random_once_work {
struct work_struct work;
struct static_key *key;
Expand Down
2 changes: 1 addition & 1 deletion net/ipv4/netfilter/ipt_ECN.c
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ set_ect_tcp(struct sk_buff *skb, const struct ipt_ECN_info *einfo)
tcph->cwr = einfo->proto.tcp.cwr;

inet_proto_csum_replace2(&tcph->check, skb,
oldval, ((__be16 *)tcph)[6], 0);
oldval, ((__be16 *)tcph)[6], false);
return true;
}

Expand Down
4 changes: 2 additions & 2 deletions net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ static void nf_nat_ipv4_csum_update(struct sk_buff *skb,
oldip = iph->daddr;
newip = t->dst.u3.ip;
}
inet_proto_csum_replace4(check, skb, oldip, newip, 1);
inet_proto_csum_replace4(check, skb, oldip, newip, true);
}

static void nf_nat_ipv4_csum_recalc(struct sk_buff *skb,
Expand Down Expand Up @@ -151,7 +151,7 @@ static void nf_nat_ipv4_csum_recalc(struct sk_buff *skb,
}
} else
inet_proto_csum_replace2(check, skb,
htons(oldlen), htons(datalen), 1);
htons(oldlen), htons(datalen), true);
}

#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
Expand Down
2 changes: 1 addition & 1 deletion net/ipv4/netfilter/nf_nat_proto_icmp.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ icmp_manip_pkt(struct sk_buff *skb,

hdr = (struct icmphdr *)(skb->data + hdroff);
inet_proto_csum_replace2(&hdr->checksum, skb,
hdr->un.echo.id, tuple->src.u.icmp.id, 0);
hdr->un.echo.id, tuple->src.u.icmp.id, false);
hdr->un.echo.id = tuple->src.u.icmp.id;
return true;
}
Expand Down
8 changes: 7 additions & 1 deletion net/ipv4/route.c
Original file line number Diff line number Diff line change
Expand Up @@ -1631,8 +1631,14 @@ static int __mkroute_input(struct sk_buff *skb,
rth->dst.output = ip_output;

rt_set_nexthop(rth, daddr, res, fnhe, res->fi, res->type, itag);
if (lwtunnel_output_redirect(rth->rt_lwtstate))
if (lwtunnel_output_redirect(rth->rt_lwtstate)) {
rth->rt_lwtstate->orig_output = rth->dst.output;
rth->dst.output = lwtunnel_output;
}
if (lwtunnel_input_redirect(rth->rt_lwtstate)) {
rth->rt_lwtstate->orig_input = rth->dst.input;
rth->dst.input = lwtunnel_input;
}
skb_dst_set(skb, &rth->dst);
out:
err = 0;
Expand Down
19 changes: 19 additions & 0 deletions net/ipv6/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,25 @@ config IPV6_MIP6

If unsure, say N.

config IPV6_ILA
tristate "IPv6: Identifier Locator Addressing (ILA)"
select LWTUNNEL
---help---
Support for IPv6 Identifier Locator Addressing (ILA).

ILA is a mechanism to do network virtualization without
encapsulation. The basic concept of ILA is that we split an
IPv6 address into a 64 bit locator and 64 bit identifier. The
identifier is the identity of an entity in communication
("who") and the locator expresses the location of the
entity ("where").

ILA can be configured using the "encap ila" option with
"ip -6 route" command. ILA is described in
https://tools.ietf.org/html/draft-herbert-nvo3-ila-00.

If unsure, say N.

config INET6_XFRM_TUNNEL
tristate
select INET6_TUNNEL
Expand Down
1 change: 1 addition & 0 deletions net/ipv6/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
obj-$(CONFIG_IPV6_MIP6) += mip6.o
obj-$(CONFIG_IPV6_ILA) += ila.o
obj-$(CONFIG_NETFILTER) += netfilter/

obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
Expand Down
Loading

0 comments on commit 0b233dc

Please sign in to comment.