Skip to content

Commit

Permalink
net: zerocopy: combine pages in zerocopy_sg_from_iter()
Browse files Browse the repository at this point in the history
Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0 fragments.
Compared to standard sendmsg(), these skbs usually contain up to 16 fragments
on arches with 4KB page sizes, instead of two.

This adds considerable costs on various ndo_start_xmit() handlers,
especially when IOMMU is in the picture.

As high performance applications are often using huge pages,
we can try to combine adjacent pages belonging to same
compound page.

Tested on AMD Rome platform, with IOMMU, nominal single TCP flow speed
is roughly doubled (~55Gbit -> ~100Gbit), when user application
is using hugepages.

For reference, nominal single TCP flow speed on this platform
without MSG_ZEROCOPY is ~65Gbit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
Eric Dumazet authored and David S. Miller committed Aug 20, 2020
1 parent 4f6c09f commit 394fcd8
Showing 1 changed file with 29 additions and 4 deletions.
33 changes: 29 additions & 4 deletions net/core/datagram.c
Original file line number Diff line number Diff line change
Expand Up @@ -623,10 +623,11 @@ int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb,

while (length && iov_iter_count(from)) {
struct page *pages[MAX_SKB_FRAGS];
struct page *last_head = NULL;
size_t start;
ssize_t copied;
unsigned long truesize;
int n = 0;
int refs, n = 0;

if (frag == MAX_SKB_FRAGS)
return -EMSGSIZE;
Expand All @@ -649,13 +650,37 @@ int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb,
} else {
refcount_add(truesize, &skb->sk->sk_wmem_alloc);
}
while (copied) {
for (refs = 0; copied != 0; start = 0) {
int size = min_t(int, copied, PAGE_SIZE - start);
skb_fill_page_desc(skb, frag++, pages[n], start, size);
start = 0;
struct page *head = compound_head(pages[n]);

start += (pages[n] - head) << PAGE_SHIFT;
copied -= size;
n++;
if (frag) {
skb_frag_t *last = &skb_shinfo(skb)->frags[frag - 1];

if (head == skb_frag_page(last) &&
start == skb_frag_off(last) + skb_frag_size(last)) {
skb_frag_size_add(last, size);
/* We combined this page, we need to release
* a reference. Since compound pages refcount
* is shared among many pages, batch the refcount
* adjustments to limit false sharing.
*/
last_head = head;
refs++;
continue;
}
}
if (refs) {
page_ref_sub(last_head, refs);
refs = 0;
}
skb_fill_page_desc(skb, frag++, head, start, size);
}
if (refs)
page_ref_sub(last_head, refs);
}
return 0;
}
Expand Down

0 comments on commit 394fcd8

Please sign in to comment.