Skip to content

Commit

Permalink
Merge branch 'cmsg_timestamp'
Browse files Browse the repository at this point in the history
Soheil Hassas Yeganeh says:

====================
add TX timestamping via cmsg

This patch series aim at enabling TX timestamping via cmsg.

Currently, to occasionally sample TX timestamping on a socket,
applications need to call setsockopt twice: first for enabling
timestamps and then for disabling them. This is an unnecessary
overhead. With cmsg, in contrast, applications can sample TX
timestamps per sendmsg().

This patch series adds the code for processing SO_TIMESTAMPING
for cmsg's of the SOL_SOCKET level, and adds the glue code for
TCP, UDP, and RAW for both IPv4 and IPv6. This implementation
supports overriding timestamp generation flags (i.e.,
SOF_TIMESTAMPING_TX_*) but not timestamp reporting flags.
Applications must still enable timestamp reporting via
setsockopt to receive timestamps.

This series does not change existing timestamping behavior for
applications that are using socket options.

I will follow up with another patch to enable timestamping for
active TFO (client-side TCP Fast Open) and also setting packet
mark via cmsgs.

Thanks!

Changes in v2:
        - Replace u32 with __u32 in the documentation.

Changes in v3:
	- Fix the broken build for L2TP (due to changes
	  in IPv6).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Apr 4, 2016
2 parents 833716e + fd91e12 commit 9d2355b
Show file tree
Hide file tree
Showing 27 changed files with 231 additions and 78 deletions.
48 changes: 45 additions & 3 deletions Documentation/networking/timestamping.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,17 @@ timeval of SO_TIMESTAMP (ms).
Supports multiple types of timestamp requests. As a result, this
socket option takes a bitmap of flags, not a boolean. In

err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val, &val);
err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val,
sizeof(val));

val is an integer with any of the following bits set. Setting other
bit returns EINVAL and does not change the current state.

The socket option configures timestamp generation for individual
sk_buffs (1.3.1), timestamp reporting to the socket's error
queue (1.3.2) and options (1.3.3). Timestamp generation can also
be enabled for individual sendmsg calls using cmsg (1.3.4).


1.3.1 Timestamp Generation

Expand All @@ -71,13 +77,16 @@ SOF_TIMESTAMPING_RX_SOFTWARE:
kernel receive stack.

SOF_TIMESTAMPING_TX_HARDWARE:
Request tx timestamps generated by the network adapter.
Request tx timestamps generated by the network adapter. This flag
can be enabled via both socket options and control messages.

SOF_TIMESTAMPING_TX_SOFTWARE:
Request tx timestamps when data leaves the kernel. These timestamps
are generated in the device driver as close as possible, but always
prior to, passing the packet to the network interface. Hence, they
require driver support and may not be available for all devices.
This flag can be enabled via both socket options and control messages.


SOF_TIMESTAMPING_TX_SCHED:
Request tx timestamps prior to entering the packet scheduler. Kernel
Expand All @@ -90,7 +99,8 @@ SOF_TIMESTAMPING_TX_SCHED:
machines with virtual devices where a transmitted packet travels
through multiple devices and, hence, multiple packet schedulers,
a timestamp is generated at each layer. This allows for fine
grained measurement of queuing delay.
grained measurement of queuing delay. This flag can be enabled
via both socket options and control messages.

SOF_TIMESTAMPING_TX_ACK:
Request tx timestamps when all data in the send buffer has been
Expand All @@ -99,6 +109,7 @@ SOF_TIMESTAMPING_TX_ACK:
over-report measurement, because the timestamp is generated when all
data up to and including the buffer at send() was acknowledged: the
cumulative acknowledgment. The mechanism ignores SACK and FACK.
This flag can be enabled via both socket options and control messages.


1.3.2 Timestamp Reporting
Expand Down Expand Up @@ -183,6 +194,37 @@ having access to the contents of the original packet, so cannot be
combined with SOF_TIMESTAMPING_OPT_TSONLY.


1.3.4. Enabling timestamps via control messages

In addition to socket options, timestamp generation can be requested
per write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1).
Using this feature, applications can sample timestamps per sendmsg()
without paying the overhead of enabling and disabling timestamps via
setsockopt:

struct msghdr *msg;
...
cmsg = CMSG_FIRSTHDR(msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SO_TIMESTAMPING;
cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
*((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED |
SOF_TIMESTAMPING_TX_SOFTWARE |
SOF_TIMESTAMPING_TX_ACK;
err = sendmsg(fd, msg, 0);

The SOF_TIMESTAMPING_TX_* flags set via cmsg will override
the SOF_TIMESTAMPING_TX_* flags set via setsockopt.

Moreover, applications must still enable timestamp reporting via
setsockopt to receive timestamps:

__u32 val = SOF_TIMESTAMPING_SOFTWARE |
SOF_TIMESTAMPING_OPT_ID /* or any other flag */;
err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val,
sizeof(val));


1.4 Bytestream Timestamps

The SO_TIMESTAMPING interface supports timestamping of bytes in a
Expand Down
3 changes: 2 additions & 1 deletion drivers/net/tun.c
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;

if (skb->sk && sk_fullsock(skb->sk)) {
sock_tx_timestamp(skb->sk, &skb_shinfo(skb)->tx_flags);
sock_tx_timestamp(skb->sk, skb->sk->sk_tsflags,
&skb_shinfo(skb)->tx_flags);
sw_tx_timestamp(skb);
}

Expand Down
3 changes: 2 additions & 1 deletion include/net/ip.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ static inline unsigned int ip_hdrlen(const struct sk_buff *skb)
}

struct ipcm_cookie {
struct sockcm_cookie sockc;
__be32 addr;
int oif;
struct ip_options_rcu *opt;
Expand Down Expand Up @@ -550,7 +551,7 @@ int ip_options_rcv_srr(struct sk_buff *skb);

void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb);
void ip_cmsg_recv_offset(struct msghdr *msg, struct sk_buff *skb, int offset);
int ip_cmsg_send(struct net *net, struct msghdr *msg,
int ip_cmsg_send(struct sock *sk, struct msghdr *msg,
struct ipcm_cookie *ipc, bool allow_ipv6);
int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval,
unsigned int optlen);
Expand Down
6 changes: 4 additions & 2 deletions include/net/ipv6.h
Original file line number Diff line number Diff line change
Expand Up @@ -867,7 +867,8 @@ int ip6_append_data(struct sock *sk,
int odd, struct sk_buff *skb),
void *from, int length, int transhdrlen, int hlimit,
int tclass, struct ipv6_txoptions *opt, struct flowi6 *fl6,
struct rt6_info *rt, unsigned int flags, int dontfrag);
struct rt6_info *rt, unsigned int flags, int dontfrag,
const struct sockcm_cookie *sockc);

int ip6_push_pending_frames(struct sock *sk);

Expand All @@ -884,7 +885,8 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
void *from, int length, int transhdrlen,
int hlimit, int tclass, struct ipv6_txoptions *opt,
struct flowi6 *fl6, struct rt6_info *rt,
unsigned int flags, int dontfrag);
unsigned int flags, int dontfrag,
const struct sockcm_cookie *sockc);

static inline struct sk_buff *ip6_finish_skb(struct sock *sk)
{
Expand Down
13 changes: 9 additions & 4 deletions include/net/sock.h
Original file line number Diff line number Diff line change
Expand Up @@ -1418,8 +1418,11 @@ void sk_send_sigurg(struct sock *sk);

struct sockcm_cookie {
u32 mark;
u16 tsflags;
};

int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
struct sockcm_cookie *sockc);
int sock_cmsg_send(struct sock *sk, struct msghdr *msg,
struct sockcm_cookie *sockc);

Expand Down Expand Up @@ -2054,19 +2057,21 @@ static inline void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
sk->sk_stamp = skb->tstamp;
}

void __sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags);
void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags);

/**
* sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
* @sk: socket sending this packet
* @tsflags: timestamping flags to use
* @tx_flags: completed with instructions for time stamping
*
* Note : callers should take care of initial *tx_flags value (usually 0)
*/
static inline void sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags)
static inline void sock_tx_timestamp(const struct sock *sk, __u16 tsflags,
__u8 *tx_flags)
{
if (unlikely(sk->sk_tsflags))
__sock_tx_timestamp(sk, tx_flags);
if (unlikely(tsflags))
__sock_tx_timestamp(tsflags, tx_flags);
if (unlikely(sock_flag(sk, SOCK_WIFI_STATUS)))
*tx_flags |= SKBTX_WIFI_STATUS;
}
Expand Down
3 changes: 2 additions & 1 deletion include/net/tcp.h
Original file line number Diff line number Diff line change
Expand Up @@ -754,7 +754,8 @@ struct tcp_skb_cb {
TCPCB_REPAIRED)

__u8 ip_dsfield; /* IPv4 tos or IPv6 dsfield */
/* 1 byte hole */
__u8 txstamp_ack:1, /* Record TX timestamp for ack? */
unused:7;
__u32 ack_seq; /* Sequence number ACK'd */
union {
struct inet_skb_parm h4;
Expand Down
3 changes: 2 additions & 1 deletion include/net/transp_v6.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg,

int ip6_datagram_send_ctl(struct net *net, struct sock *sk, struct msghdr *msg,
struct flowi6 *fl6, struct ipv6_txoptions *opt,
int *hlimit, int *tclass, int *dontfrag);
int *hlimit, int *tclass, int *dontfrag,
struct sockcm_cookie *sockc);

void ip6_dgram_sock_seq_show(struct seq_file *seq, struct sock *sp,
__u16 srcp, __u16 destp, int bucket);
Expand Down
10 changes: 10 additions & 0 deletions include/uapi/linux/net_tstamp.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,16 @@ enum {
SOF_TIMESTAMPING_LAST
};

/*
* SO_TIMESTAMPING flags are either for recording a packet timestamp or for
* reporting the timestamp to user space.
* Recording flags can be set both via socket options and control messages.
*/
#define SOF_TIMESTAMPING_TX_RECORD_MASK (SOF_TIMESTAMPING_TX_HARDWARE | \
SOF_TIMESTAMPING_TX_SOFTWARE | \
SOF_TIMESTAMPING_TX_SCHED | \
SOF_TIMESTAMPING_TX_ACK)

/**
* struct hwtstamp_config - %SIOCGHWTSTAMP and %SIOCSHWTSTAMP parameter
*
Expand Down
2 changes: 1 addition & 1 deletion net/can/raw.c
Original file line number Diff line number Diff line change
Expand Up @@ -755,7 +755,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
if (err < 0)
goto free_skb;

sock_tx_timestamp(sk, &skb_shinfo(skb)->tx_flags);
sock_tx_timestamp(sk, sk->sk_tsflags, &skb_shinfo(skb)->tx_flags);

skb->dev = dev;
skb->sk = sk;
Expand Down
49 changes: 37 additions & 12 deletions net/core/sock.c
Original file line number Diff line number Diff line change
Expand Up @@ -832,7 +832,8 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
!(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
if (sk->sk_protocol == IPPROTO_TCP &&
sk->sk_type == SOCK_STREAM) {
if (sk->sk_state != TCP_ESTABLISHED) {
if ((1 << sk->sk_state) &
(TCPF_CLOSE | TCPF_LISTEN)) {
ret = -EINVAL;
break;
}
Expand Down Expand Up @@ -1866,27 +1867,51 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
}
EXPORT_SYMBOL(sock_alloc_send_skb);

int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
struct sockcm_cookie *sockc)
{
u32 tsflags;

switch (cmsg->cmsg_type) {
case SO_MARK:
if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
return -EPERM;
if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
return -EINVAL;
sockc->mark = *(u32 *)CMSG_DATA(cmsg);
break;
case SO_TIMESTAMPING:
if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
return -EINVAL;

tsflags = *(u32 *)CMSG_DATA(cmsg);
if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK)
return -EINVAL;

sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
sockc->tsflags |= tsflags;
break;
default:
return -EINVAL;
}
return 0;
}
EXPORT_SYMBOL(__sock_cmsg_send);

int sock_cmsg_send(struct sock *sk, struct msghdr *msg,
struct sockcm_cookie *sockc)
{
struct cmsghdr *cmsg;
int ret;

for_each_cmsghdr(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
return -EINVAL;
if (cmsg->cmsg_level != SOL_SOCKET)
continue;
switch (cmsg->cmsg_type) {
case SO_MARK:
if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
return -EPERM;
if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
return -EINVAL;
sockc->mark = *(u32 *)CMSG_DATA(cmsg);
break;
default:
return -EINVAL;
}
ret = __sock_cmsg_send(sk, msg, cmsg, sockc);
if (ret)
return ret;
}
return 0;
}
Expand Down
9 changes: 8 additions & 1 deletion net/ipv4/ip_sockglue.c
Original file line number Diff line number Diff line change
Expand Up @@ -219,11 +219,12 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sk_buff *skb,
}
EXPORT_SYMBOL(ip_cmsg_recv_offset);

int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc,
int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
bool allow_ipv6)
{
int err, val;
struct cmsghdr *cmsg;
struct net *net = sock_net(sk);

for_each_cmsghdr(cmsg, msg) {
if (!CMSG_OK(msg, cmsg))
Expand All @@ -244,6 +245,12 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc,
continue;
}
#endif
if (cmsg->cmsg_level == SOL_SOCKET) {
if (__sock_cmsg_send(sk, msg, cmsg, &ipc->sockc))
return -EINVAL;
continue;
}

if (cmsg->cmsg_level != SOL_IP)
continue;
switch (cmsg->cmsg_type) {
Expand Down
7 changes: 4 additions & 3 deletions net/ipv4/ping.c
Original file line number Diff line number Diff line change
Expand Up @@ -737,17 +737,16 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
/* no remote port */
}

ipc.sockc.tsflags = sk->sk_tsflags;
ipc.addr = inet->inet_saddr;
ipc.opt = NULL;
ipc.oif = sk->sk_bound_dev_if;
ipc.tx_flags = 0;
ipc.ttl = 0;
ipc.tos = -1;

sock_tx_timestamp(sk, &ipc.tx_flags);

if (msg->msg_controllen) {
err = ip_cmsg_send(sock_net(sk), msg, &ipc, false);
err = ip_cmsg_send(sk, msg, &ipc, false);
if (unlikely(err)) {
kfree(ipc.opt);
return err;
Expand All @@ -768,6 +767,8 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
rcu_read_unlock();
}

sock_tx_timestamp(sk, ipc.sockc.tsflags, &ipc.tx_flags);

saddr = ipc.addr;
ipc.addr = faddr = daddr;

Expand Down
Loading

0 comments on commit 9d2355b

Please sign in to comment.