Skip to content

Commit

Permalink
Merge branch 'net-sctp-Avoid-allocating-high-order-memory-with-kmalloc'
Browse files Browse the repository at this point in the history
Konstantin Khorenko says:

====================
net/sctp: Avoid allocating high order memory with kmalloc()

Each SCTP association can have up to 65535 input and output streams.
For each stream type an array of sctp_stream_in or sctp_stream_out
structures is allocated using kmalloc_array() function. This function
allocates physically contiguous memory regions, so this can lead
to allocation of memory regions of very high order, i.e.:

  sizeof(struct sctp_stream_out) == 24,
  ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
  which means 9th memory order.

This can lead to a memory allocation failures on the systems
under a memory stress.

We actually do not need these arrays of memory to be physically
contiguous. Possible simple solution would be to use kvmalloc()
instread of kmalloc() as kvmalloc() can allocate physically scattered
pages if contiguous pages are not available. But the problem
is that the allocation can happed in a softirq context with
GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario.

So the other possible solution is to use flexible arrays instead of
contiguios arrays of memory so that the memory would be allocated
on a per-page basis.

This patchset replaces kvmalloc() with flex_array usage.
It consists of two parts:

  * First patch is preparatory - it mechanically wraps all direct
    access to assoc->stream.out[] and assoc->stream.in[] arrays
    with SCTP_SO() and SCTP_SI() wrappers so that later a direct
    array access could be easily changed to an access to a
    flex_array (or any other possible alternative).
  * Second patch replaces kmalloc_array() with flex_array usage.

v2 changes:
 sctp_stream_in() users are updated to provide stream as an argument,
 sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}().

v3 changes:
 Move type chages struct sctp_stream_out -> flex_array to next patch.
 Make sctp_stream_{in,out}() static incline and move them to a header.

Performance results (single stream):
====================================
  * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread)
  * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
          RAM: 32 Gb

  * netperf: taken from https://github.com/HewlettPackard/netperf.git,
	     compiled from sources with sctp support
  * netperf server and client are run on the same node
  * ip link set lo mtu 1500

The script used to run tests:
 # cat run_tests.sh
 #!/bin/bash

for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do
  echo "TEST: $test";
  for i in `seq 1 3`; do
    echo "Iteration: $i";
    set -x
    netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \
            -l 60 -- -m 1452;
    set +x
  done
done
================================================

Results (a bit reformatted to be more readable):
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

				v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_STREAM
212992 212992   1452    60.21	1125.52		1247.04
212992 212992   1452    60.20	1376.38		1149.95
212992 212992   1452    60.20	1131.40		1163.85
TEST: SCTP_STREAM_MANY
212992 212992   1452    60.00	1111.00		1310.05
212992 212992   1452    60.00	1188.55		1130.50
212992 212992   1452    60.00	1108.06		1162.50

===========
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

					v4.18-rc7	v4.18-rc7 + fixes
TEST: SCTP_RR
212992 212992 1        1       60.00	45486.98	46089.43
212992 212992 1        1       60.00	45584.18	45994.21
212992 212992 1        1       60.00	45703.86	45720.84
TEST: SCTP_RR_MANY
212992 212992 1        1       60.00	40.75		40.77
212992 212992 1        1       60.00	40.58		40.08
212992 212992 1        1       60.00	39.98		39.97

Performance results for many streams:
=====================================
   * Kernel: v4.18-rc8 - stock and with 2 patches v3
   * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
           RAM: 32 Gb

   * sctp_test: https://github.com/sctp/lksctp-tools
   * both server and client are run on the same node
   * ip link set lo mtu 1500
   * sysctl -w vm.max_map_count=65530000 (need it to make memory fragmented)

The script used to run tests:
=============================
 # cat run_sctp_test.sh
 #!/bin/bash

set -x

uname -r
ip link set lo mtu 1500
swapoff -a

free
cat /proc/buddyinfo

./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 &
sleep 3

time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \
         -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null

killall -9 lt-sctp_test
===============================

Results (a bit reformatted to be more readable):

1) ms stock kernel v4.18-rc8, no memory fragmentation
	test 1		test 2		test 3
real    0m14.715s	0m14.593s	0m15.954s
user    0m0.954s	0m0.955s	0m0.854s
sys     0m13.388s	0m12.537s	0m13.749s

2) kernel with fixes, no memory fragmentation
	test 1		test 2		test 3
real    0m14.959s	0m14.693s	0m14.762s
user    0m0.948s	0m0.921s	0m0.929s
sys     0m13.538s	0m13.225s	0m13.217s

3) kernel with fixes, memory fragmented
'free':
               total        used        free      shared  buff/cache   available
Mem:       32906008    30555200      302740         764     2048068      266452
Mem:       32906008    30379948      541436         764     1984624      442376
Mem:       32906008    30717312      262380         764     1926316      109908

/proc/buddyinfo:
Node 0, zone   Normal  40773     37     34     29      0      0      0      0      0      0      0
Node 0, zone   Normal 100332     68      8      4      2      1      1      0      0      0      0
Node 0, zone   Normal  31113      7      2      1      0      0      0      0      0      0      0

	test 1		test 2		test 3
real    0m14.159s	0m15.252s	0m15.826s
user    0m0.839s	0m1.004s	0m1.048s
sys     0m11.827s	0m14.240s	0m14.778s
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Aug 11, 2018
2 parents b70f1f3 + 0d493b4 commit 2b14e1e
Show file tree
Hide file tree
Showing 9 changed files with 172 additions and 105 deletions.
40 changes: 28 additions & 12 deletions include/net/sctp/structs.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
#include <linux/atomic.h> /* This gets us atomic counters. */
#include <linux/skbuff.h> /* We need sk_buff_head. */
#include <linux/workqueue.h> /* We need tq_struct. */
#include <linux/flex_array.h> /* We need flex_array. */
#include <linux/sctp.h> /* We need sctp* header structs. */
#include <net/sctp/auth.h> /* We need auth specific structs */
#include <net/ip.h> /* For inet_skb_parm */
Expand Down Expand Up @@ -398,37 +399,35 @@ void sctp_stream_update(struct sctp_stream *stream, struct sctp_stream *new);

/* What is the current SSN number for this stream? */
#define sctp_ssn_peek(stream, type, sid) \
((stream)->type[sid].ssn)
(sctp_stream_##type((stream), (sid))->ssn)

/* Return the next SSN number for this stream. */
#define sctp_ssn_next(stream, type, sid) \
((stream)->type[sid].ssn++)
(sctp_stream_##type((stream), (sid))->ssn++)

/* Skip over this ssn and all below. */
#define sctp_ssn_skip(stream, type, sid, ssn) \
((stream)->type[sid].ssn = ssn + 1)
(sctp_stream_##type((stream), (sid))->ssn = ssn + 1)

/* What is the current MID number for this stream? */
#define sctp_mid_peek(stream, type, sid) \
((stream)->type[sid].mid)
(sctp_stream_##type((stream), (sid))->mid)

/* Return the next MID number for this stream. */
#define sctp_mid_next(stream, type, sid) \
((stream)->type[sid].mid++)
(sctp_stream_##type((stream), (sid))->mid++)

/* Skip over this mid and all below. */
#define sctp_mid_skip(stream, type, sid, mid) \
((stream)->type[sid].mid = mid + 1)

#define sctp_stream_in(asoc, sid) (&(asoc)->stream.in[sid])
(sctp_stream_##type((stream), (sid))->mid = mid + 1)

/* What is the current MID_uo number for this stream? */
#define sctp_mid_uo_peek(stream, type, sid) \
((stream)->type[sid].mid_uo)
(sctp_stream_##type((stream), (sid))->mid_uo)

/* Return the next MID_uo number for this stream. */
#define sctp_mid_uo_next(stream, type, sid) \
((stream)->type[sid].mid_uo++)
(sctp_stream_##type((stream), (sid))->mid_uo++)

/*
* Pointers to address related SCTP functions.
Expand Down Expand Up @@ -1440,8 +1439,8 @@ struct sctp_stream_in {
};

struct sctp_stream {
struct sctp_stream_out *out;
struct sctp_stream_in *in;
struct flex_array *out;
struct flex_array *in;
__u16 outcnt;
__u16 incnt;
/* Current stream being sent, if any */
Expand All @@ -1463,6 +1462,23 @@ struct sctp_stream {
struct sctp_stream_interleave *si;
};

static inline struct sctp_stream_out *sctp_stream_out(
const struct sctp_stream *stream,
__u16 sid)
{
return flex_array_get(stream->out, sid);
}

static inline struct sctp_stream_in *sctp_stream_in(
const struct sctp_stream *stream,
__u16 sid)
{
return flex_array_get(stream->in, sid);
}

#define SCTP_SO(s, i) sctp_stream_out((s), (i))
#define SCTP_SI(s, i) sctp_stream_in((s), (i))

#define SCTP_STREAM_CLOSED 0x00
#define SCTP_STREAM_OPEN 0x01

Expand Down
6 changes: 4 additions & 2 deletions net/sctp/chunk.c
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
time_after(jiffies, chunk->msg->expires_at)) {
struct sctp_stream_out *streamout =
&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
SCTP_SO(&chunk->asoc->stream,
chunk->sinfo.sinfo_stream);

if (chunk->sent_count) {
chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
Expand All @@ -339,7 +340,8 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
} else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
struct sctp_stream_out *streamout =
&chunk->asoc->stream.out[chunk->sinfo.sinfo_stream];
SCTP_SO(&chunk->asoc->stream,
chunk->sinfo.sinfo_stream);

chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
streamout->ext->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
Expand Down
11 changes: 6 additions & 5 deletions net/sctp/outqueue.c
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ static inline void sctp_outq_head_data(struct sctp_outq *q,
q->out_qlen += ch->skb->len;

stream = sctp_chunk_stream_no(ch);
oute = q->asoc->stream.out[stream].ext;
oute = SCTP_SO(&q->asoc->stream, stream)->ext;
list_add(&ch->stream_list, &oute->outq);
}

Expand All @@ -101,7 +101,7 @@ static inline void sctp_outq_tail_data(struct sctp_outq *q,
q->out_qlen += ch->skb->len;

stream = sctp_chunk_stream_no(ch);
oute = q->asoc->stream.out[stream].ext;
oute = SCTP_SO(&q->asoc->stream, stream)->ext;
list_add_tail(&ch->stream_list, &oute->outq);
}

Expand Down Expand Up @@ -372,7 +372,7 @@ static int sctp_prsctp_prune_sent(struct sctp_association *asoc,
sctp_insert_list(&asoc->outqueue.abandoned,
&chk->transmitted_list);

streamout = &asoc->stream.out[chk->sinfo.sinfo_stream];
streamout = SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);
asoc->sent_cnt_removable--;
asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
streamout->ext->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
Expand Down Expand Up @@ -416,7 +416,7 @@ static int sctp_prsctp_prune_unsent(struct sctp_association *asoc,
asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
if (chk->sinfo.sinfo_stream < asoc->stream.outcnt) {
struct sctp_stream_out *streamout =
&asoc->stream.out[chk->sinfo.sinfo_stream];
SCTP_SO(&asoc->stream, chk->sinfo.sinfo_stream);

streamout->ext->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
}
Expand Down Expand Up @@ -1082,6 +1082,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
/* Finally, transmit new packets. */
while ((chunk = sctp_outq_dequeue_data(ctx->q)) != NULL) {
__u32 sid = ntohs(chunk->subh.data_hdr->stream);
__u8 stream_state = SCTP_SO(&ctx->asoc->stream, sid)->state;

/* Has this chunk expired? */
if (sctp_chunk_abandoned(chunk)) {
Expand All @@ -1091,7 +1092,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
continue;
}

if (ctx->asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
if (stream_state == SCTP_STREAM_CLOSED) {
sctp_outq_head_data(ctx->q, chunk);
break;
}
Expand Down
4 changes: 2 additions & 2 deletions net/sctp/socket.c
Original file line number Diff line number Diff line change
Expand Up @@ -1911,7 +1911,7 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
goto err;
}

if (unlikely(!asoc->stream.out[sinfo->sinfo_stream].ext)) {
if (unlikely(!SCTP_SO(&asoc->stream, sinfo->sinfo_stream)->ext)) {
err = sctp_stream_init_ext(&asoc->stream, sinfo->sinfo_stream);
if (err)
goto err;
Expand Down Expand Up @@ -7154,7 +7154,7 @@ static int sctp_getsockopt_pr_streamstatus(struct sock *sk, int len,
if (!asoc || params.sprstat_sid >= asoc->stream.outcnt)
goto out;

streamoute = asoc->stream.out[params.sprstat_sid].ext;
streamoute = SCTP_SO(&asoc->stream, params.sprstat_sid)->ext;
if (!streamoute) {
/* Not allocated yet, means all stats are 0 */
params.sprstat_abandoned_unsent = 0;
Expand Down
Loading

0 comments on commit 2b14e1e

Please sign in to comment.