Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Browse files Browse the repository at this point in the history
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your
net-next tree. A couple of new features for nf_tables, and unsorted
cleanups and incremental updates for the Netfilter tree. More
specifically, they are:

1) Allow to check for TCP option presence via nft_exthdr, patch
   from Phil Sutter.

2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana.

3) Use pr_cont() in ebt_log, from Joe Perches.

4) Remove some dead code in arp_tables reported via static analysis
   tool, from Colin Ian King.

5) Consolidate nf_tables expression validation, from Liping Zhang.

6) Consolidate set lookup via nft_set_lookup().

7) Remove unnecessary rcu read lock side in bridge netfilter, from
   Florian Westphal.

8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo.

9) Pass nft_ctx struct to object initialization indirections, from
   Florian Westphal.

10) Add code to integrate conntrack helper into nf_tables, also from
    Florian.

11) Allow to check if interface index or name exists via
    NFTA_FIB_F_PRESENT, from Phil Sutter.

12) Simplify resolve_normal_ct(), from Florian.

13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang.

14) Use rwlock in nft_set_rbtree set, also from Liping Zhang.

15) One patch to remove a useless printk at netns init path in ipvs,
    and several patches to document IPVS knobs.

16) Use refcount_t for reference counter in the Netfilter/IPVS code,
    from Elena Reshetova.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Mar 21, 2017
2 parents b9974d7 + 4485a84 commit 41e9573
Show file tree
Hide file tree
Showing 54 changed files with 615 additions and 297 deletions.
68 changes: 60 additions & 8 deletions Documentation/networking/ipvs-sysctl.txt
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,14 @@ nat_icmp_send - BOOLEAN
for VS/NAT when the load balancer receives packets from real
servers but the connection entries don't exist.

pmtu_disc - BOOLEAN
0 - disabled
not 0 - enabled (default)

By default, reject with FRAG_NEEDED all DF packets that exceed
the PMTU, irrespective of the forwarding method. For TUN method
the flag can be disabled to fragment such packets.

secure_tcp - INTEGER
0 - disabled (default)

Expand All @@ -185,15 +193,59 @@ secure_tcp - INTEGER
The value definition is the same as that of drop_entry and
drop_packet.

sync_threshold - INTEGER
default 3
sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period
default 3 50

It sets synchronization threshold, which is the minimum number
of incoming packets that a connection needs to receive before
the connection will be synchronized. A connection will be
synchronized, every time the number of its incoming packets
modulus sync_period equals the threshold. The range of the
threshold is from 0 to sync_period.

When sync_period and sync_refresh_period are 0, send sync only
for state changes or only once when pkts matches sync_threshold

sync_refresh_period - UNSIGNED INTEGER
default 0

In seconds, difference in reported connection timer that triggers
new sync message. It can be used to avoid sync messages for the
specified period (or half of the connection timeout if it is lower)
if connection state is not changed since last sync.

This is useful for normal connections with high traffic to reduce
sync rate. Additionally, retry sync_retries times with period of
sync_refresh_period/8.

sync_retries - INTEGER
default 0

Defines sync retries with period of sync_refresh_period/8. Useful
to protect against loss of sync messages. The range of the
sync_retries is from 0 to 3.

sync_qlen_max - UNSIGNED LONG

Hard limit for queued sync messages that are not sent yet. It
defaults to 1/32 of the memory pages but actually represents
number of messages. It will protect us from allocating large
parts of memory when the sending rate is lower than the queuing
rate.

sync_sock_size - INTEGER
default 0

Configuration of SNDBUF (master) or RCVBUF (slave) socket limit.
Default value is 0 (preserve system defaults).

sync_ports - INTEGER
default 1

It sets synchronization threshold, which is the minimum number
of incoming packets that a connection needs to receive before
the connection will be synchronized. A connection will be
synchronized, every time the number of its incoming packets
modulus 50 equals the threshold. The range of the threshold is
from 0 to 49.
The number of threads that master and backup servers can use for
sync traffic. Every thread will use single UDP port, thread 0 will
use the default port 8848 while last thread will use port
8848+sync_ports-1.

snat_reroute - BOOLEAN
0 - disabled
Expand Down
16 changes: 9 additions & 7 deletions include/net/ip_vs.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
#include <linux/list.h> /* for struct list_head */
#include <linux/spinlock.h> /* for struct rwlock_t */
#include <linux/atomic.h> /* for struct atomic_t */
#include <linux/refcount.h> /* for struct refcount_t */

#include <linux/compiler.h>
#include <linux/timer.h>
#include <linux/bug.h>
Expand Down Expand Up @@ -525,7 +527,7 @@ struct ip_vs_conn {
struct netns_ipvs *ipvs;

/* counter and timer */
atomic_t refcnt; /* reference count */
refcount_t refcnt; /* reference count */
struct timer_list timer; /* Expiration timer */
volatile unsigned long timeout; /* timeout */

Expand Down Expand Up @@ -667,7 +669,7 @@ struct ip_vs_dest {
atomic_t conn_flags; /* flags to copy to conn */
atomic_t weight; /* server weight */

atomic_t refcnt; /* reference counter */
refcount_t refcnt; /* reference counter */
struct ip_vs_stats stats; /* statistics */
unsigned long idle_start; /* start time, jiffies */

Expand Down Expand Up @@ -1211,14 +1213,14 @@ struct ip_vs_conn * ip_vs_conn_out_get_proto(struct netns_ipvs *ipvs, int af,
*/
static inline bool __ip_vs_conn_get(struct ip_vs_conn *cp)
{
return atomic_inc_not_zero(&cp->refcnt);
return refcount_inc_not_zero(&cp->refcnt);
}

/* put back the conn without restarting its timer */
static inline void __ip_vs_conn_put(struct ip_vs_conn *cp)
{
smp_mb__before_atomic();
atomic_dec(&cp->refcnt);
refcount_dec(&cp->refcnt);
}
void ip_vs_conn_put(struct ip_vs_conn *cp);
void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport);
Expand Down Expand Up @@ -1410,18 +1412,18 @@ void ip_vs_try_bind_dest(struct ip_vs_conn *cp);

static inline void ip_vs_dest_hold(struct ip_vs_dest *dest)
{
atomic_inc(&dest->refcnt);
refcount_inc(&dest->refcnt);
}

static inline void ip_vs_dest_put(struct ip_vs_dest *dest)
{
smp_mb__before_atomic();
atomic_dec(&dest->refcnt);
refcount_dec(&dest->refcnt);
}

static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest)
{
if (atomic_dec_and_test(&dest->refcnt))
if (refcount_dec_and_test(&dest->refcnt))
kfree(dest);
}

Expand Down
4 changes: 3 additions & 1 deletion include/net/netfilter/nf_conntrack_expect.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#ifndef _NF_CONNTRACK_EXPECT_H
#define _NF_CONNTRACK_EXPECT_H

#include <linux/refcount.h>

#include <net/netfilter/nf_conntrack.h>
#include <net/netfilter/nf_conntrack_zones.h>

Expand Down Expand Up @@ -37,7 +39,7 @@ struct nf_conntrack_expect {
struct timer_list timeout;

/* Usage count. */
atomic_t use;
refcount_t use;

/* Flags */
unsigned int flags;
Expand Down
3 changes: 2 additions & 1 deletion include/net/netfilter/nf_conntrack_timeout.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <net/net_namespace.h>
#include <linux/netfilter/nf_conntrack_common.h>
#include <linux/netfilter/nf_conntrack_tuple_common.h>
#include <linux/refcount.h>
#include <net/netfilter/nf_conntrack.h>
#include <net/netfilter/nf_conntrack_extend.h>

Expand All @@ -12,7 +13,7 @@
struct ctnl_timeout {
struct list_head head;
struct rcu_head rcu_head;
atomic_t refcnt;
refcount_t refcnt;
char name[CTNL_TIMEOUT_NAME_MAX];
__u16 l3num;
struct nf_conntrack_l4proto *l4proto;
Expand Down
12 changes: 7 additions & 5 deletions include/net/netfilter/nf_tables.h
Original file line number Diff line number Diff line change
Expand Up @@ -385,10 +385,11 @@ static inline struct nft_set *nft_set_container_of(const void *priv)
return (void *)priv - offsetof(struct nft_set, data);
}

struct nft_set *nf_tables_set_lookup(const struct nft_table *table,
const struct nlattr *nla, u8 genmask);
struct nft_set *nf_tables_set_lookup_byid(const struct net *net,
const struct nlattr *nla, u8 genmask);
struct nft_set *nft_set_lookup(const struct net *net,
const struct nft_table *table,
const struct nlattr *nla_set_name,
const struct nlattr *nla_set_id,
u8 genmask);

static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
{
Expand Down Expand Up @@ -1016,7 +1017,8 @@ struct nft_object_type {
unsigned int maxattr;
struct module *owner;
const struct nla_policy *policy;
int (*init)(const struct nlattr * const tb[],
int (*init)(const struct nft_ctx *ctx,
const struct nlattr *const tb[],
struct nft_object *obj);
void (*destroy)(struct nft_object *obj);
int (*dump)(struct sk_buff *skb,
Expand Down
2 changes: 1 addition & 1 deletion include/net/netfilter/nft_fib.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,6 @@ void nft_fib6_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
void nft_fib6_eval(const struct nft_expr *expr, struct nft_regs *regs,
const struct nft_pktinfo *pkt);

void nft_fib_store_result(void *reg, enum nft_fib_result r,
void nft_fib_store_result(void *reg, const struct nft_fib *priv,
const struct nft_pktinfo *pkt, int index);
#endif
26 changes: 25 additions & 1 deletion include/uapi/linux/netfilter/nf_tables.h
Original file line number Diff line number Diff line change
Expand Up @@ -815,6 +815,17 @@ enum nft_rt_keys {
NFT_RT_NEXTHOP6,
};

/**
* enum nft_hash_types - nf_tables hash expression types
*
* @NFT_HASH_JENKINS: Jenkins Hash
* @NFT_HASH_SYM: Symmetric Hash
*/
enum nft_hash_types {
NFT_HASH_JENKINS,
NFT_HASH_SYM,
};

/**
* enum nft_hash_attributes - nf_tables hash expression netlink attributes
*
Expand All @@ -824,6 +835,7 @@ enum nft_rt_keys {
* @NFTA_HASH_MODULUS: modulus value (NLA_U32)
* @NFTA_HASH_SEED: seed value (NLA_U32)
* @NFTA_HASH_OFFSET: add this offset value to hash result (NLA_U32)
* @NFTA_HASH_TYPE: hash operation (NLA_U32: nft_hash_types)
*/
enum nft_hash_attributes {
NFTA_HASH_UNSPEC,
Expand All @@ -833,6 +845,7 @@ enum nft_hash_attributes {
NFTA_HASH_MODULUS,
NFTA_HASH_SEED,
NFTA_HASH_OFFSET,
NFTA_HASH_TYPE,
__NFTA_HASH_MAX,
};
#define NFTA_HASH_MAX (__NFTA_HASH_MAX - 1)
Expand Down Expand Up @@ -1244,12 +1257,23 @@ enum nft_fib_flags {
NFTA_FIB_F_MARK = 1 << 2, /* use skb->mark */
NFTA_FIB_F_IIF = 1 << 3, /* restrict to iif */
NFTA_FIB_F_OIF = 1 << 4, /* restrict to oif */
NFTA_FIB_F_PRESENT = 1 << 5, /* check existence only */
};

enum nft_ct_helper_attributes {
NFTA_CT_HELPER_UNSPEC,
NFTA_CT_HELPER_NAME,
NFTA_CT_HELPER_L3PROTO,
NFTA_CT_HELPER_L4PROTO,
__NFTA_CT_HELPER_MAX,
};
#define NFTA_CT_HELPER_MAX (__NFTA_CT_HELPER_MAX - 1)

#define NFT_OBJECT_UNSPEC 0
#define NFT_OBJECT_COUNTER 1
#define NFT_OBJECT_QUOTA 2
#define __NFT_OBJECT_MAX 3
#define NFT_OBJECT_CT_HELPER 3
#define __NFT_OBJECT_MAX 4
#define NFT_OBJECT_MAX (__NFT_OBJECT_MAX - 1)

/**
Expand Down
3 changes: 0 additions & 3 deletions net/bridge/br_netfilter_hooks.c
Original file line number Diff line number Diff line change
Expand Up @@ -995,13 +995,10 @@ int br_nf_hook_thresh(unsigned int hook, struct net *net,
if (!elem)
return okfn(net, sk, skb);

/* We may already have this, but read-locks nest anyway */
rcu_read_lock();
nf_hook_state_init(&state, hook, NFPROTO_BRIDGE, indev, outdev,
sk, net, okfn);

ret = nf_hook_slow(skb, &state, elem);
rcu_read_unlock();
if (ret == 1)
ret = okfn(net, sk, skb);

Expand Down
34 changes: 17 additions & 17 deletions net/bridge/netfilter/ebt_log.c
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ print_ports(const struct sk_buff *skb, uint8_t protocol, int offset)
pptr = skb_header_pointer(skb, offset,
sizeof(_ports), &_ports);
if (pptr == NULL) {
printk(" INCOMPLETE TCP/UDP header");
pr_cont(" INCOMPLETE TCP/UDP header");
return;
}
printk(" SPT=%u DPT=%u", ntohs(pptr->src), ntohs(pptr->dst));
pr_cont(" SPT=%u DPT=%u", ntohs(pptr->src), ntohs(pptr->dst));
}
}

Expand Down Expand Up @@ -100,11 +100,11 @@ ebt_log_packet(struct net *net, u_int8_t pf, unsigned int hooknum,

ih = skb_header_pointer(skb, 0, sizeof(_iph), &_iph);
if (ih == NULL) {
printk(" INCOMPLETE IP header");
pr_cont(" INCOMPLETE IP header");
goto out;
}
printk(" IP SRC=%pI4 IP DST=%pI4, IP tos=0x%02X, IP proto=%d",
&ih->saddr, &ih->daddr, ih->tos, ih->protocol);
pr_cont(" IP SRC=%pI4 IP DST=%pI4, IP tos=0x%02X, IP proto=%d",
&ih->saddr, &ih->daddr, ih->tos, ih->protocol);
print_ports(skb, ih->protocol, ih->ihl*4);
goto out;
}
Expand All @@ -120,11 +120,11 @@ ebt_log_packet(struct net *net, u_int8_t pf, unsigned int hooknum,

ih = skb_header_pointer(skb, 0, sizeof(_iph), &_iph);
if (ih == NULL) {
printk(" INCOMPLETE IPv6 header");
pr_cont(" INCOMPLETE IPv6 header");
goto out;
}
printk(" IPv6 SRC=%pI6 IPv6 DST=%pI6, IPv6 priority=0x%01X, Next Header=%d",
&ih->saddr, &ih->daddr, ih->priority, ih->nexthdr);
pr_cont(" IPv6 SRC=%pI6 IPv6 DST=%pI6, IPv6 priority=0x%01X, Next Header=%d",
&ih->saddr, &ih->daddr, ih->priority, ih->nexthdr);
nexthdr = ih->nexthdr;
offset_ph = ipv6_skip_exthdr(skb, sizeof(_iph), &nexthdr, &frag_off);
if (offset_ph == -1)
Expand All @@ -142,12 +142,12 @@ ebt_log_packet(struct net *net, u_int8_t pf, unsigned int hooknum,

ah = skb_header_pointer(skb, 0, sizeof(_arph), &_arph);
if (ah == NULL) {
printk(" INCOMPLETE ARP header");
pr_cont(" INCOMPLETE ARP header");
goto out;
}
printk(" ARP HTYPE=%d, PTYPE=0x%04x, OPCODE=%d",
ntohs(ah->ar_hrd), ntohs(ah->ar_pro),
ntohs(ah->ar_op));
pr_cont(" ARP HTYPE=%d, PTYPE=0x%04x, OPCODE=%d",
ntohs(ah->ar_hrd), ntohs(ah->ar_pro),
ntohs(ah->ar_op));

/* If it's for Ethernet and the lengths are OK,
* then log the ARP payload
Expand All @@ -161,17 +161,17 @@ ebt_log_packet(struct net *net, u_int8_t pf, unsigned int hooknum,
ap = skb_header_pointer(skb, sizeof(_arph),
sizeof(_arpp), &_arpp);
if (ap == NULL) {
printk(" INCOMPLETE ARP payload");
pr_cont(" INCOMPLETE ARP payload");
goto out;
}
printk(" ARP MAC SRC=%pM ARP IP SRC=%pI4 ARP MAC DST=%pM ARP IP DST=%pI4",
ap->mac_src, ap->ip_src, ap->mac_dst, ap->ip_dst);
pr_cont(" ARP MAC SRC=%pM ARP IP SRC=%pI4 ARP MAC DST=%pM ARP IP DST=%pI4",
ap->mac_src, ap->ip_src,
ap->mac_dst, ap->ip_dst);
}
}
out:
printk("\n");
pr_cont("\n");
spin_unlock_bh(&ebt_log_lock);

}

static unsigned int
Expand Down
6 changes: 1 addition & 5 deletions net/bridge/netfilter/nft_reject_bridge.c
Original file line number Diff line number Diff line change
Expand Up @@ -375,11 +375,7 @@ static int nft_reject_bridge_init(const struct nft_ctx *ctx,
const struct nlattr * const tb[])
{
struct nft_reject *priv = nft_expr_priv(expr);
int icmp_code, err;

err = nft_reject_bridge_validate(ctx, expr, NULL);
if (err < 0)
return err;
int icmp_code;

if (tb[NFTA_REJECT_TYPE] == NULL)
return -EINVAL;
Expand Down
2 changes: 0 additions & 2 deletions net/ipv4/netfilter/arp_tables.c
Original file line number Diff line number Diff line change
Expand Up @@ -562,8 +562,6 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
XT_ERROR_TARGET) == 0)
++newinfo->stacksize;
}
if (ret != 0)
goto out_free;

ret = -EINVAL;
if (i != repl->num_entries)
Expand Down
Loading

0 comments on commit 41e9573

Please sign in to comment.