Skip to content

Commit

Permalink
tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.
Browse files Browse the repository at this point in the history
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.

For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.

Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.

Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
Arjun Roy authored and David S. Miller committed Feb 17, 2020
1 parent c8856c0 commit 3394651
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
1 change: 1 addition & 0 deletions include/uapi/linux/tcp.h
Original file line number Diff line number Diff line change
Expand Up @@ -346,5 +346,6 @@ struct tcp_zerocopy_receive {
__u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */
__u32 inq; /* out: amount of bytes in read queue */
__s32 err; /* out: socket error */
};
#endif /* _UAPI_LINUX_TCP_H */
8 changes: 7 additions & 1 deletion net/ipv4/tcp.c
Original file line number Diff line number Diff line change
Expand Up @@ -3676,14 +3676,20 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
lock_sock(sk);
err = tcp_zerocopy_receive(sk, &zc);
release_sock(sk);
if (len == sizeof(zc))
goto zerocopy_rcv_sk_err;
switch (len) {
case sizeof(zc):
case offsetofend(struct tcp_zerocopy_receive, err):
goto zerocopy_rcv_sk_err;
case offsetofend(struct tcp_zerocopy_receive, inq):
goto zerocopy_rcv_inq;
case offsetofend(struct tcp_zerocopy_receive, length):
default:
goto zerocopy_rcv_out;
}
zerocopy_rcv_sk_err:
if (!err)
zc.err = sock_error(sk);
zerocopy_rcv_inq:
zc.inq = tcp_inq_hint(sk);
zerocopy_rcv_out:
Expand Down

0 comments on commit 3394651

Please sign in to comment.