Skip to content

Commit

Permalink
Merge branch 'ipfrags'
Browse files Browse the repository at this point in the history
Jesper Dangaard Brouer says:

====================
This patchset is V2, with some trivial code fixes, which were noticed
by DaveM. It is still a partly respin of my fragmentation optimization
patches: http://thread.gmane.org/gmane.linux.network/250914

This is not the complete patchset, from the gmane link above. In this
patchset, I primarily focus on adjusting cacheline for better SMP/NUMA
performance.

Once this patchset have been agreed upon, I will continue and respin
the rest of my patches.

This time around, I have created a frag DoS generator, via the tool
trafgen (http://netsniff-ng.org/).  To create a stable DoS scenario
(no longer relying on frame dropping due to disabled flow-control).

Two 10G interfaces are under-test, and uses Ethernet flow-control.  A
third interface is used for generating the DoS attack (this interface
is also 10G, but it does not need to be, as 500Kpps DoS is enough).

Test types summary (netperf):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS

Patch list:
 Patch-01 - net: cacheline adjust struct netns_frags for better frag performance
 Patch-02 - net: cacheline adjust struct inet_frags for better frag performance
 Patch-03 - net: cacheline adjust struct inet_frag_queue
 Patch-04 - net: frag helper functions for mem limit tracking
 Patch-05 - net: use lib/percpu_counter API for fragmentation mem accounting
 Patch-06 - net: frag, move LRU list maintenance outside of rwlock

Performance table summary:

 Test-type:  Test-20G64K    Test-20G3F  20G64K+DoS   20G3F+DoS
 ----------  -----------    ----------  ----------   ---------
  net-next:  15114.5 Mbit/s   8954.21     2444.28     3918.01 Mbit/s
  Patch-01:  16075.8 Mbit/s   8976.18     2621.49     4072.79 Mbit/s
  Patch-02:  17806.9 Mbit/s   9280.32     2478.62     4274.59 Mbit/s
  Patch-03:  17317.4 Mbit/s   9308.62     2546.05     4336.59 Mbit/s
  Patch-04:  17635.9 Mbit/s   9256.16     2535.25     4327.63 Mbit/s
  Patch-05:  18027.0 Mbit/s   9918.99     2492.62     3621.68 Mbit/s
  Patch-06:  18486.7 Mbit/s  10723.20     3657.85     4560.64 Mbit/s

 I cannot explain the under-DoS regression that patch-05/percpu_counter
 introduces.  But patch-06/LRU-lock corrects the situation again.

Below is a testlab setup description, with links to the trafgen DoS
packet config used.

Testlab
=======

Server setup
------------
The machine acting as a server:
 - 2x CPU (E5-2630)
 - Thus a NUMA arch/machine
 - 4x 10Gbit/s ports
 - NICs 2x Intel Dual port 82599 based (driver ixgbe)

Setup:
 - Interfaces uses Ethernet flow control
 - Flush all iptables
 - Remove all iptables related module.
 - Kill irqbalance
 - Pin each 10G NIC port to a *single* CPU each

Pinning can easily be done by command hacks::

 for x in /proc/irq/*/eth8*/../smp_affinity_list ; do echo 1 > $x; done
 for x in /proc/irq/*/eth9*/../smp_affinity_list ; do echo 3 > $x; done
 for x in /proc/irq/*/eth31*/../smp_affinity_list; do echo 6 > $x; done
 for x in /proc/irq/*/eth32*/../smp_affinity_list; do echo 8 > $x; done

Notice NUMA setting: The CPU to NIC tying is carefully choosen
according to the NUMA node setup.  Thus, NICs connected to a PCI-e
slot that is connected to a physical CPU socket are tied together.

Choosing only a single CPU per NIC (port) is just to ease provoking
and debugging this performance issue. (In real setups, you can choose
more CPU, just remember the NUMA node in the equation).

Tools
-----

Netperf is used, with option -T to ensure CPU binding.
The netserver processes, are NAPI pinned::

 numactl -m0 -c0 netserver
 numactl -m1 -c 1 netserver -p 1337

I now have a frag DoS generator, created via the tool:
  trafgen (see: http://netsniff-ng.org/)

Trafgen packet config file:
 http://people.netfilter.org/hawk/frag_work/trafgen/frag_packet03_small_frag.txf

Notice, I'm using features of trafgen, recently developed by Daniel
Borkmann, thus you need the latest git tree to use my trafgen packet
config.

 git://github.com/borkmann/netsniff-ng.git

Command line:
 trafgen --dev eth51 --conf frag_packet03_small_frag.txf -V -k 100 --cpus 2

Tests types
-----------

Test(20G64K) UDP-64K 2x 10Gbit/s with no DoS traffic:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 export SIZE=$((65507)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
 netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
 netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
 wait $! && tail -n3 ${LOG}.* && \
 tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'

Test(20G3F) UDP-3xfrags 2x 10Gbit/s with no DoS traffic:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 export SIZE=$((3*1472)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
 netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
 netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
 wait $! && tail -n3 ${LOG}.* && \
tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'

Awk script for summming results:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jan 29, 2013
2 parents 656a05c + 3ef0eb0 commit 5a1dc31
Show file tree
Hide file tree
Showing 6 changed files with 118 additions and 56 deletions.
84 changes: 75 additions & 9 deletions include/net/inet_frag.h
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
#ifndef __NET_FRAG_H__
#define __NET_FRAG_H__

#include <linux/percpu_counter.h>

struct netns_frags {
int nqueues;
atomic_t mem;
struct list_head lru_list;
spinlock_t lru_lock;

/* The percpu_counter "mem" need to be cacheline aligned.
* mem.count must not share cacheline with other writers
*/
struct percpu_counter mem ____cacheline_aligned_in_smp;

/* sysctls */
int timeout;
Expand All @@ -13,12 +20,11 @@ struct netns_frags {
};

struct inet_frag_queue {
struct hlist_node list;
struct netns_frags *net;
struct list_head lru_list; /* lru list member */
spinlock_t lock;
atomic_t refcnt;
struct timer_list timer; /* when will this queue expire? */
struct list_head lru_list; /* lru list member */
struct hlist_node list;
atomic_t refcnt;
struct sk_buff *fragments; /* list of received fragments */
struct sk_buff *fragments_tail;
ktime_t stamp;
Expand All @@ -31,24 +37,29 @@ struct inet_frag_queue {
#define INET_FRAG_LAST_IN 1

u16 max_size;

struct netns_frags *net;
};

#define INETFRAGS_HASHSZ 64

struct inet_frags {
struct hlist_head hash[INETFRAGS_HASHSZ];
rwlock_t lock;
u32 rnd;
int qsize;
/* This rwlock is a global lock (seperate per IPv4, IPv6 and
* netfilter). Important to keep this on a seperate cacheline.
*/
rwlock_t lock ____cacheline_aligned_in_smp;
int secret_interval;
struct timer_list secret_timer;
u32 rnd;
int qsize;

unsigned int (*hashfn)(struct inet_frag_queue *);
bool (*match)(struct inet_frag_queue *q, void *arg);
void (*constructor)(struct inet_frag_queue *q,
void *arg);
void (*destructor)(struct inet_frag_queue *);
void (*skb_free)(struct sk_buff *);
bool (*match)(struct inet_frag_queue *q, void *arg);
void (*frag_expire)(unsigned long data);
};

Expand All @@ -72,4 +83,59 @@ static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f
inet_frag_destroy(q, f, NULL);
}

/* Memory Tracking Functions. */

/* The default percpu_counter batch size is not big enough to scale to
* fragmentation mem acct sizes.
* The mem size of a 64K fragment is approx:
* (44 fragments * 2944 truesize) + frag_queue struct(200) = 129736 bytes
*/
static unsigned int frag_percpu_counter_batch = 130000;

static inline int frag_mem_limit(struct netns_frags *nf)
{
return percpu_counter_read(&nf->mem);
}

static inline void sub_frag_mem_limit(struct inet_frag_queue *q, int i)
{
__percpu_counter_add(&q->net->mem, -i, frag_percpu_counter_batch);
}

static inline void add_frag_mem_limit(struct inet_frag_queue *q, int i)
{
__percpu_counter_add(&q->net->mem, i, frag_percpu_counter_batch);
}

static inline void init_frag_mem_limit(struct netns_frags *nf)
{
percpu_counter_init(&nf->mem, 0);
}

static inline int sum_frag_mem_limit(struct netns_frags *nf)
{
return percpu_counter_sum_positive(&nf->mem);
}

static inline void inet_frag_lru_move(struct inet_frag_queue *q)
{
spin_lock(&q->net->lru_lock);
list_move_tail(&q->lru_list, &q->net->lru_list);
spin_unlock(&q->net->lru_lock);
}

static inline void inet_frag_lru_del(struct inet_frag_queue *q)
{
spin_lock(&q->net->lru_lock);
list_del(&q->lru_list);
spin_unlock(&q->net->lru_lock);
}

static inline void inet_frag_lru_add(struct netns_frags *nf,
struct inet_frag_queue *q)
{
spin_lock(&nf->lru_lock);
list_add_tail(&q->lru_list, &nf->lru_list);
spin_unlock(&nf->lru_lock);
}
#endif
2 changes: 1 addition & 1 deletion include/net/ipv6.h
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ static inline int ip6_frag_nqueues(struct net *net)

static inline int ip6_frag_mem(struct net *net)
{
return atomic_read(&net->ipv6.frags.mem);
return sum_frag_mem_limit(&net->ipv6.frags);
}
#endif

Expand Down
39 changes: 21 additions & 18 deletions net/ipv4/inet_fragment.c
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,9 @@ EXPORT_SYMBOL(inet_frags_init);
void inet_frags_init_net(struct netns_frags *nf)
{
nf->nqueues = 0;
atomic_set(&nf->mem, 0);
init_frag_mem_limit(nf);
INIT_LIST_HEAD(&nf->lru_list);
spin_lock_init(&nf->lru_lock);
}
EXPORT_SYMBOL(inet_frags_init_net);

Expand All @@ -91,16 +92,18 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
local_bh_disable();
inet_frag_evictor(nf, f, true);
local_bh_enable();

percpu_counter_destroy(&nf->mem);
}
EXPORT_SYMBOL(inet_frags_exit_net);

static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
{
write_lock(&f->lock);
hlist_del(&fq->list);
list_del(&fq->lru_list);
fq->net->nqueues--;
write_unlock(&f->lock);
inet_frag_lru_del(fq);
}

void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
Expand All @@ -117,12 +120,8 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
EXPORT_SYMBOL(inet_frag_kill);

static inline void frag_kfree_skb(struct netns_frags *nf, struct inet_frags *f,
struct sk_buff *skb, int *work)
struct sk_buff *skb)
{
if (work)
*work -= skb->truesize;

atomic_sub(skb->truesize, &nf->mem);
if (f->skb_free)
f->skb_free(skb);
kfree_skb(skb);
Expand All @@ -133,6 +132,7 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
{
struct sk_buff *fp;
struct netns_frags *nf;
unsigned int sum, sum_truesize = 0;

WARN_ON(!(q->last_in & INET_FRAG_COMPLETE));
WARN_ON(del_timer(&q->timer) != 0);
Expand All @@ -143,13 +143,14 @@ void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f,
while (fp) {
struct sk_buff *xp = fp->next;

frag_kfree_skb(nf, f, fp, work);
sum_truesize += fp->truesize;
frag_kfree_skb(nf, f, fp);
fp = xp;
}

sum = sum_truesize + f->qsize;
if (work)
*work -= f->qsize;
atomic_sub(f->qsize, &nf->mem);
*work -= sum;
sub_frag_mem_limit(q, sum);

if (f->destructor)
f->destructor(q);
Expand All @@ -164,22 +165,23 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
int work, evicted = 0;

if (!force) {
if (atomic_read(&nf->mem) <= nf->high_thresh)
if (frag_mem_limit(nf) <= nf->high_thresh)
return 0;
}

work = atomic_read(&nf->mem) - nf->low_thresh;
work = frag_mem_limit(nf) - nf->low_thresh;
while (work > 0) {
read_lock(&f->lock);
spin_lock(&nf->lru_lock);

if (list_empty(&nf->lru_list)) {
read_unlock(&f->lock);
spin_unlock(&nf->lru_lock);
break;
}

q = list_first_entry(&nf->lru_list,
struct inet_frag_queue, lru_list);
atomic_inc(&q->refcnt);
read_unlock(&f->lock);
spin_unlock(&nf->lru_lock);

spin_lock(&q->lock);
if (!(q->last_in & INET_FRAG_COMPLETE))
Expand Down Expand Up @@ -233,9 +235,9 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf,

atomic_inc(&qp->refcnt);
hlist_add_head(&qp->list, &f->hash[hash]);
list_add_tail(&qp->lru_list, &nf->lru_list);
nf->nqueues++;
write_unlock(&f->lock);
inet_frag_lru_add(nf, qp);
return qp;
}

Expand All @@ -250,7 +252,8 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,

q->net = nf;
f->constructor(q, arg);
atomic_add(f->qsize, &nf->mem);
add_frag_mem_limit(q, f->qsize);

setup_timer(&q->timer, f->frag_expire, (unsigned long)q);
spin_lock_init(&q->lock);
atomic_set(&q->refcnt, 1);
Expand Down
28 changes: 12 additions & 16 deletions net/ipv4/ip_fragment.c
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ int ip_frag_nqueues(struct net *net)

int ip_frag_mem(struct net *net)
{
return atomic_read(&net->ipv4.frags.mem);
return sum_frag_mem_limit(&net->ipv4.frags);
}

static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
Expand Down Expand Up @@ -161,13 +161,6 @@ static bool ip4_frag_match(struct inet_frag_queue *q, void *a)
qp->user == arg->user;
}

/* Memory Tracking Functions. */
static void frag_kfree_skb(struct netns_frags *nf, struct sk_buff *skb)
{
atomic_sub(skb->truesize, &nf->mem);
kfree_skb(skb);
}

static void ip4_frag_init(struct inet_frag_queue *q, void *a)
{
struct ipq *qp = container_of(q, struct ipq, q);
Expand Down Expand Up @@ -340,6 +333,7 @@ static inline int ip_frag_too_far(struct ipq *qp)
static int ip_frag_reinit(struct ipq *qp)
{
struct sk_buff *fp;
unsigned int sum_truesize = 0;

if (!mod_timer(&qp->q.timer, jiffies + qp->q.net->timeout)) {
atomic_inc(&qp->q.refcnt);
Expand All @@ -349,9 +343,12 @@ static int ip_frag_reinit(struct ipq *qp)
fp = qp->q.fragments;
do {
struct sk_buff *xp = fp->next;
frag_kfree_skb(qp->q.net, fp);

sum_truesize += fp->truesize;
kfree_skb(fp);
fp = xp;
} while (fp);
sub_frag_mem_limit(&qp->q, sum_truesize);

qp->q.last_in = 0;
qp->q.len = 0;
Expand Down Expand Up @@ -496,7 +493,8 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
qp->q.fragments = next;

qp->q.meat -= free_it->len;
frag_kfree_skb(qp->q.net, free_it);
sub_frag_mem_limit(&qp->q, free_it->truesize);
kfree_skb(free_it);
}
}

Expand All @@ -519,7 +517,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
qp->q.stamp = skb->tstamp;
qp->q.meat += skb->len;
qp->ecn |= ecn;
atomic_add(skb->truesize, &qp->q.net->mem);
add_frag_mem_limit(&qp->q, skb->truesize);
if (offset == 0)
qp->q.last_in |= INET_FRAG_FIRST_IN;

Expand All @@ -531,9 +529,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
qp->q.meat == qp->q.len)
return ip_frag_reasm(qp, prev, dev);

write_lock(&ip4_frags.lock);
list_move_tail(&qp->q.lru_list, &qp->q.net->lru_list);
write_unlock(&ip4_frags.lock);
inet_frag_lru_move(&qp->q);
return -EINPROGRESS;

err:
Expand Down Expand Up @@ -617,7 +613,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
head->len -= clone->len;
clone->csum = 0;
clone->ip_summed = head->ip_summed;
atomic_add(clone->truesize, &qp->q.net->mem);
add_frag_mem_limit(&qp->q, clone->truesize);
}

skb_push(head, head->data - skb_network_header(head));
Expand Down Expand Up @@ -645,7 +641,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
}
fp = next;
}
atomic_sub(sum_truesize, &qp->q.net->mem);
sub_frag_mem_limit(&qp->q, sum_truesize);

head->next = NULL;
head->dev = dev;
Expand Down
11 changes: 5 additions & 6 deletions net/ipv6/netfilter/nf_conntrack_reasm.c
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->q.meat += skb->len;
if (payload_len > fq->q.max_size)
fq->q.max_size = payload_len;
atomic_add(skb->truesize, &fq->q.net->mem);
add_frag_mem_limit(&fq->q, skb->truesize);

/* The first fragment.
* nhoffset is obtained from the first fragment, of course.
Expand All @@ -328,9 +328,8 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->nhoffset = nhoff;
fq->q.last_in |= INET_FRAG_FIRST_IN;
}
write_lock(&nf_frags.lock);
list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);
write_unlock(&nf_frags.lock);

inet_frag_lru_move(&fq->q);
return 0;

discard_fq:
Expand Down Expand Up @@ -398,7 +397,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
clone->ip_summed = head->ip_summed;

NFCT_FRAG6_CB(clone)->orig = NULL;
atomic_add(clone->truesize, &fq->q.net->mem);
add_frag_mem_limit(&fq->q, clone->truesize);
}

/* We have to remove fragment header from datagram and to relocate
Expand All @@ -422,7 +421,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device *dev)
head->csum = csum_add(head->csum, fp->csum);
head->truesize += fp->truesize;
}
atomic_sub(head->truesize, &fq->q.net->mem);
sub_frag_mem_limit(&fq->q, head->truesize);

head->local_df = 1;
head->next = NULL;
Expand Down
Loading

0 comments on commit 5a1dc31

Please sign in to comment.