Skip to content

Commit

Permalink
Merge branch 'rtnetlink-add-IFA_TARGET_NETNSID-for-RTM_GETADDR'
Browse files Browse the repository at this point in the history
Christian Brauner says:

====================
rtnetlink: add IFA_TARGET_NETNSID for RTM_GETADDR

This iteration should mainly addresses the suggestion to use
IFA_TARGET_NETNSID as the property name. Additionally, an an alias for
the already existing IFLA_IF_NETNSID property is added.

Note that two additional cleanup patches (8\9 and 9\9) were added to
address concerns raised that passing more than 6 arguments to a function
will cause additional variables to be pushed onto the stack instead of
being placed into registers. The way I addressed this is by introducing
two new struct inet{6}_fill_args that are used to pass common
information down to inet{6}_fill_if*() functions shortening all those
functions to three pointer arguments.
If this is something more people than Kirill find useful they can be
kept if not they can simply be dropped in later iterations of this
series or when merging.

Here is a short overview:
1. Rename from IFA_IF_NETNSID to IFA_TARGET_NETNSID.
2. Add IFLA_TARGET_NETNSID as an alias for IFA_IFLA_NETNSID and switch
   all occurrences over to the new alias.
3. Add inet4_fill_args struct to avoid passing more than 6 arguments in
   inet_fill_if*() functions.
4. Add inet6_fill_args struct to avoid passing more than 6 arguments in
   inet_fill_if*() functions.

The only functional change is the export of rtnl_get_net_ns_capable()
which is needed in case ipv6 is built as a module.

Note, I did not change the property name to IFA_TARGET_NSID as there was
no clear agreement what would be preferred. My personal preference is to
keep the IFA_IF_NETNSID name because it aligns naturally with the
IFLA_IF_NETNSID property for RTM_*LINK requests. Jiri seems to prefer
this name too.
However, if there is agreement that another property name makes more
sense I'm happy to send a v2 that changes this.

To test this patchset I performed 1 million getifaddrs() requests
against a network namespace containing 5 interfaces (lo, eth{0-4}). The
first test used a network namespace aware getifaddrs() implementation I
wrote and the second test used the traditional setns() + getifaddrs()
method. The results show that this patchsets allows userspace to cut
retrieval time in half:
1. netns_getifaddrs():      82 microseconds
2. setns() + getifaddrs(): 162 microseconds

A while back we introduced and enabled IFLA_IF_NETNSID in
RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
to signficant performance increases since it allows userspace to avoid
taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
interfaces from the netns associated with the netns_fd. Especially when a
lot of network namespaces are in use, using setns() becomes increasingly
problematic when performance matters.
Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
getifaddrs() style functions and friends). But currently, RTM_GETADDR
requests do not support a similar property like IFLA_IF_NETNSID for
RTM_*LINK requests.
This is problematic since userspace can retrieve interfaces from another
network namespace by sending a IFLA_IF_NETNSID property along but
RTM_GETLINK request but is still forced to use the legacy setns() style of
retrieving interfaces in RTM_GETADDR requests.

The goal of this series is to make it possible to perform RTM_GETADDR
requests on different network namespaces. To this end a new IFA_IF_NETNSID
property for RTM_*ADDR requests is introduced. It can be used to send a
network namespace identifier along in RTM_*ADDR requests.  The network
namespace identifier will be used to retrieve the target network namespace
in which the request is supposed to be fulfilled.  This aligns the behavior
of RTM_*ADDR requests with the behavior of RTM_*LINK requests.

- The caller must have assigned a valid network namespace identifier for
  the target network namespace.
- The caller must have CAP_NET_ADMIN in the owning user namespace of the
  target network namespace.

[1]: commit 7973bfd ("rtnetlink: remove check for IFLA_IF_NETNSID")
[2]: commit 5bb8ed0 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
[3]: commit b61ad68 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
[4]: commit c310bfc ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
[5]: commit 7c4f63b ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Sep 6, 2018
2 parents d4cc597 + 203651b commit 6ef848e
Show file tree
Hide file tree
Showing 7 changed files with 180 additions and 63 deletions.
1 change: 1 addition & 0 deletions include/net/rtnetlink.h
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm);

int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len,
struct netlink_ext_ack *exterr);
struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid);

#define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind)

Expand Down
1 change: 1 addition & 0 deletions include/uapi/linux/if_addr.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ enum {
IFA_MULTICAST,
IFA_FLAGS,
IFA_RT_PRIORITY, /* u32, priority/metric for prefix route */
IFA_TARGET_NETNSID,
__IFA_MAX,
};

Expand Down
1 change: 1 addition & 0 deletions include/uapi/linux/if_link.h
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ enum {
IFLA_EVENT,
IFLA_NEW_NETNSID,
IFLA_IF_NETNSID,
IFLA_TARGET_NETNSID = IFLA_IF_NETNSID, /* new alias */
IFLA_CARRIER_UP_COUNT,
IFLA_CARRIER_DOWN_COUNT,
IFLA_NEW_IFINDEX,
Expand Down
51 changes: 30 additions & 21 deletions net/core/rtnetlink.c
Original file line number Diff line number Diff line change
Expand Up @@ -1016,7 +1016,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
+ nla_total_size(4) /* IFLA_NEW_NETNSID */
+ nla_total_size(4) /* IFLA_NEW_IFINDEX */
+ nla_total_size(1) /* IFLA_PROTO_DOWN */
+ nla_total_size(4) /* IFLA_IF_NETNSID */
+ nla_total_size(4) /* IFLA_TARGET_NETNSID */
+ nla_total_size(4) /* IFLA_CARRIER_UP_COUNT */
+ nla_total_size(4) /* IFLA_CARRIER_DOWN_COUNT */
+ nla_total_size(4) /* IFLA_MIN_MTU */
Expand Down Expand Up @@ -1598,7 +1598,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
ifm->ifi_flags = dev_get_flags(dev);
ifm->ifi_change = change;

if (tgt_netnsid >= 0 && nla_put_s32(skb, IFLA_IF_NETNSID, tgt_netnsid))
if (tgt_netnsid >= 0 && nla_put_s32(skb, IFLA_TARGET_NETNSID, tgt_netnsid))
goto nla_put_failure;

if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
Expand Down Expand Up @@ -1737,7 +1737,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_XDP] = { .type = NLA_NESTED },
[IFLA_EVENT] = { .type = NLA_U32 },
[IFLA_GROUP] = { .type = NLA_U32 },
[IFLA_IF_NETNSID] = { .type = NLA_S32 },
[IFLA_TARGET_NETNSID] = { .type = NLA_S32 },
[IFLA_CARRIER_UP_COUNT] = { .type = NLA_U32 },
[IFLA_CARRIER_DOWN_COUNT] = { .type = NLA_U32 },
[IFLA_MIN_MTU] = { .type = NLA_U32 },
Expand Down Expand Up @@ -1845,7 +1845,15 @@ static bool link_dump_filtered(struct net_device *dev,
return false;
}

static struct net *get_target_net(struct sock *sk, int netnsid)
/**
* rtnl_get_net_ns_capable - Get netns if sufficiently privileged.
* @sk: netlink socket
* @netnsid: network namespace identifier
*
* Returns the network namespace identified by netnsid on success or an error
* pointer on failure.
*/
struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid)
{
struct net *net;

Expand All @@ -1862,6 +1870,7 @@ static struct net *get_target_net(struct sock *sk, int netnsid)
}
return net;
}
EXPORT_SYMBOL_GPL(rtnl_get_net_ns_capable);

static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
{
Expand Down Expand Up @@ -1895,9 +1904,9 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)

if (nlmsg_parse(cb->nlh, hdrlen, tb, IFLA_MAX,
ifla_policy, NULL) >= 0) {
if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
tgt_net = get_target_net(skb->sk, netnsid);
if (tb[IFLA_TARGET_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
if (IS_ERR(tgt_net)) {
tgt_net = net;
netnsid = -1;
Expand Down Expand Up @@ -1984,7 +1993,7 @@ EXPORT_SYMBOL(rtnl_link_get_net);
*
* 1. IFLA_NET_NS_PID
* 2. IFLA_NET_NS_FD
* 3. IFLA_IF_NETNSID
* 3. IFLA_TARGET_NETNSID
*/
static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net,
struct nlattr *tb[])
Expand All @@ -1994,10 +2003,10 @@ static struct net *rtnl_link_get_net_by_nlattr(struct net *src_net,
if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD])
return rtnl_link_get_net(src_net, tb);

if (!tb[IFLA_IF_NETNSID])
if (!tb[IFLA_TARGET_NETNSID])
return get_net(src_net);

net = get_net_ns_by_id(src_net, nla_get_u32(tb[IFLA_IF_NETNSID]));
net = get_net_ns_by_id(src_net, nla_get_u32(tb[IFLA_TARGET_NETNSID]));
if (!net)
return ERR_PTR(-EINVAL);

Expand Down Expand Up @@ -2038,13 +2047,13 @@ static int rtnl_ensure_unique_netns(struct nlattr *tb[],
return -EOPNOTSUPP;
}

if (tb[IFLA_IF_NETNSID] && (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD]))
if (tb[IFLA_TARGET_NETNSID] && (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD]))
goto invalid_attr;

if (tb[IFLA_NET_NS_PID] && (tb[IFLA_IF_NETNSID] || tb[IFLA_NET_NS_FD]))
if (tb[IFLA_NET_NS_PID] && (tb[IFLA_TARGET_NETNSID] || tb[IFLA_NET_NS_FD]))
goto invalid_attr;

if (tb[IFLA_NET_NS_FD] && (tb[IFLA_IF_NETNSID] || tb[IFLA_NET_NS_PID]))
if (tb[IFLA_NET_NS_FD] && (tb[IFLA_TARGET_NETNSID] || tb[IFLA_NET_NS_PID]))
goto invalid_attr;

return 0;
Expand Down Expand Up @@ -2320,7 +2329,7 @@ static int do_setlink(const struct sk_buff *skb,
if (err < 0)
return err;

if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD] || tb[IFLA_IF_NETNSID]) {
if (tb[IFLA_NET_NS_PID] || tb[IFLA_NET_NS_FD] || tb[IFLA_TARGET_NETNSID]) {
struct net *net = rtnl_link_get_net_capable(skb, dev_net(dev),
tb, CAP_NET_ADMIN);
if (IS_ERR(net)) {
Expand Down Expand Up @@ -2763,9 +2772,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[IFLA_IFNAME])
nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);

if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
if (tb[IFLA_TARGET_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
Expand Down Expand Up @@ -3173,9 +3182,9 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (err < 0)
return err;

if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
if (tb[IFLA_TARGET_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_TARGET_NETNSID]);
tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
Expand Down Expand Up @@ -3260,13 +3269,13 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
{
int idx;
int s_idx = cb->family;
int type = cb->nlh->nlmsg_type - RTM_BASE;

if (s_idx == 0)
s_idx = 1;

for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) {
struct rtnl_link **tab;
int type = cb->nlh->nlmsg_type-RTM_BASE;
struct rtnl_link *link;
rtnl_dumpit_func dumpit;

Expand Down
62 changes: 51 additions & 11 deletions net/ipv4/devinet.c
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,15 @@ static const struct nla_policy ifa_ipv4_policy[IFA_MAX+1] = {
[IFA_CACHEINFO] = { .len = sizeof(struct ifa_cacheinfo) },
[IFA_FLAGS] = { .type = NLA_U32 },
[IFA_RT_PRIORITY] = { .type = NLA_U32 },
[IFA_TARGET_NETNSID] = { .type = NLA_S32 },
};

struct inet_fill_args {
u32 portid;
u32 seq;
int event;
unsigned int flags;
int netnsid;
};

#define IN4_ADDR_HSIZE_SHIFT 8
Expand Down Expand Up @@ -1584,13 +1593,14 @@ static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp,
}

static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
u32 portid, u32 seq, int event, unsigned int flags)
struct inet_fill_args *args)
{
struct ifaddrmsg *ifm;
struct nlmsghdr *nlh;
u32 preferred, valid;

nlh = nlmsg_put(skb, portid, seq, event, sizeof(*ifm), flags);
nlh = nlmsg_put(skb, args->portid, args->seq, args->event, sizeof(*ifm),
args->flags);
if (!nlh)
return -EMSGSIZE;

Expand All @@ -1601,6 +1611,10 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
ifm->ifa_scope = ifa->ifa_scope;
ifm->ifa_index = ifa->ifa_dev->dev->ifindex;

if (args->netnsid >= 0 &&
nla_put_s32(skb, IFA_TARGET_NETNSID, args->netnsid))
goto nla_put_failure;

if (!(ifm->ifa_flags & IFA_F_PERMANENT)) {
preferred = ifa->ifa_preferred_lft;
valid = ifa->ifa_valid_lft;
Expand Down Expand Up @@ -1647,7 +1661,16 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,

static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
{
struct inet_fill_args fillargs = {
.portid = NETLINK_CB(cb->skb).portid,
.seq = cb->nlh->nlmsg_seq,
.event = RTM_NEWADDR,
.flags = NLM_F_MULTI,
.netnsid = -1,
};
struct net *net = sock_net(skb->sk);
struct nlattr *tb[IFA_MAX+1];
struct net *tgt_net = net;
int h, s_h;
int idx, s_idx;
int ip_idx, s_ip_idx;
Expand All @@ -1660,12 +1683,24 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
s_idx = idx = cb->args[1];
s_ip_idx = ip_idx = cb->args[2];

if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
ifa_ipv4_policy, NULL) >= 0) {
if (tb[IFA_TARGET_NETNSID]) {
fillargs.netnsid = nla_get_s32(tb[IFA_TARGET_NETNSID]);

tgt_net = rtnl_get_net_ns_capable(skb->sk,
fillargs.netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
}

for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
idx = 0;
head = &net->dev_index_head[h];
head = &tgt_net->dev_index_head[h];
rcu_read_lock();
cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
net->dev_base_seq;
cb->seq = atomic_read(&tgt_net->ipv4.dev_addr_genid) ^
tgt_net->dev_base_seq;
hlist_for_each_entry_rcu(dev, head, index_hlist) {
if (idx < s_idx)
goto cont;
Expand All @@ -1679,10 +1714,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
ifa = ifa->ifa_next, ip_idx++) {
if (ip_idx < s_ip_idx)
continue;
if (inet_fill_ifaddr(skb, ifa,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_NEWADDR, NLM_F_MULTI) < 0) {
if (inet_fill_ifaddr(skb, ifa, &fillargs) < 0) {
rcu_read_unlock();
goto done;
}
Expand All @@ -1698,15 +1730,23 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
cb->args[0] = h;
cb->args[1] = idx;
cb->args[2] = ip_idx;
if (fillargs.netnsid >= 0)
put_net(tgt_net);

return skb->len;
}

static void rtmsg_ifa(int event, struct in_ifaddr *ifa, struct nlmsghdr *nlh,
u32 portid)
{
struct inet_fill_args fillargs = {
.portid = portid,
.seq = nlh ? nlh->nlmsg_seq : 0,
.event = event,
.flags = 0,
.netnsid = -1,
};
struct sk_buff *skb;
u32 seq = nlh ? nlh->nlmsg_seq : 0;
int err = -ENOBUFS;
struct net *net;

Expand All @@ -1715,7 +1755,7 @@ static void rtmsg_ifa(int event, struct in_ifaddr *ifa, struct nlmsghdr *nlh,
if (!skb)
goto errout;

err = inet_fill_ifaddr(skb, ifa, portid, seq, event, 0);
err = inet_fill_ifaddr(skb, ifa, &fillargs);
if (err < 0) {
/* -EMSGSIZE implies BUG in inet_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
Expand Down
Loading

0 comments on commit 6ef848e

Please sign in to comment.