Skip to content

Commit

Permalink
netdev: add per-queue statistics
Browse files Browse the repository at this point in the history
The ethtool-nl family does a good job exposing various protocol
related and IEEE/IETF statistics which used to get dumped under
ethtool -S, with creative names. Queue stats don't have a netlink
API, yet, and remain a lion's share of ethtool -S output for new
drivers. Not only is that bad because the names differ driver to
driver but it's also bug-prone. Intuitively drivers try to report
only the stats for active queues, but querying ethtool stats
involves multiple system calls, and the number of stats is
read separately from the stats themselves. Worse still when user
space asks for values of the stats, it doesn't inform the kernel
how big the buffer is. If number of stats increases in the meantime
kernel will overflow user buffer.

Add a netlink API for dumping queue stats. Queue information is
exposed via the netdev-genl family, so add the stats there.
Support per-queue and sum-for-device dumps. Latter will be useful
when subsequent patches add more interesting common stats than
just bytes and packets.

The API does not currently distinguish between HW and SW stats.
The expectation is that the source of the stats will either not
matter much (good packets) or be obvious (skb alloc errors).

Acked-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Jakub Kicinski committed Mar 8, 2024
1 parent e8bb2cc commit ab63a23
Show file tree
Hide file tree
Showing 9 changed files with 421 additions and 0 deletions.
84 changes: 84 additions & 0 deletions Documentation/netlink/specs/netdev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@ definitions:
name: queue-type
type: enum
entries: [ rx, tx ]
-
name: qstats-scope
type: flags
entries: [ queue ]

attribute-sets:
-
Expand Down Expand Up @@ -265,6 +269,66 @@ attribute-sets:
doc: ID of the NAPI instance which services this queue.
type: u32

-
name: qstats
doc: |
Get device statistics, scoped to a device or a queue.
These statistics extend (and partially duplicate) statistics available
in struct rtnl_link_stats64.
Value of the `scope` attribute determines how statistics are
aggregated. When aggregated for the entire device the statistics
represent the total number of events since last explicit reset of
the device (i.e. not a reconfiguration like changing queue count).
When reported per-queue, however, the statistics may not add
up to the total number of events, will only be reported for currently
active objects, and will likely report the number of events since last
reconfiguration.
attributes:
-
name: ifindex
doc: ifindex of the netdevice to which stats belong.
type: u32
checks:
min: 1
-
name: queue-type
doc: Queue type as rx, tx, for queue-id.
type: u32
enum: queue-type
-
name: queue-id
doc: Queue ID, if stats are scoped to a single queue instance.
type: u32
-
name: scope
doc: |
What object type should be used to iterate over the stats.
type: uint
enum: qstats-scope
-
name: rx-packets
doc: |
Number of wire packets successfully received and passed to the stack.
For drivers supporting XDP, XDP is considered the first layer
of the stack, so packets consumed by XDP are still counted here.
type: uint
value: 8 # reserve some attr ids in case we need more metadata later
-
name: rx-bytes
doc: Successfully received bytes, see `rx-packets`.
type: uint
-
name: tx-packets
doc: |
Number of wire packets successfully sent. Packet is considered to be
successfully sent once it is in device memory (usually this means
the device has issued a DMA completion for the packet).
type: uint
-
name: tx-bytes
doc: Successfully sent bytes, see `tx-packets`.
type: uint

operations:
list:
-
Expand Down Expand Up @@ -405,6 +469,26 @@ operations:
attributes:
- ifindex
reply: *napi-get-op
-
name: qstats-get
doc: |
Get / dump fine grained statistics. Which statistics are reported
depends on the device and the driver, and whether the driver stores
software counters per-queue.
attribute-set: qstats
dump:
request:
attributes:
- scope
reply:
attributes:
- ifindex
- queue-type
- queue-id
- rx-packets
- rx-bytes
- tx-packets
- tx-bytes

mcast-groups:
list:
Expand Down
15 changes: 15 additions & 0 deletions Documentation/networking/statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,15 @@ If `-s` is specified once the detailed errors won't be shown.

`ip` supports JSON formatting via the `-j` option.

Queue statistics
~~~~~~~~~~~~~~~~

Queue statistics are accessible via the netdev netlink family.

Currently no widely distributed CLI exists to access those statistics.
Kernel development tools (ynl) can be used to experiment with them,
see `Documentation/userspace-api/netlink/intro-specs.rst`.

Protocol-specific statistics
----------------------------

Expand Down Expand Up @@ -147,6 +156,12 @@ Statistics are reported both in the responses to link information
requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).

netdev (netlink)
~~~~~~~~~~~~~~~~

`netdev` generic netlink family allows accessing page pool and per queue
statistics.

ethtool
-------

Expand Down
3 changes: 3 additions & 0 deletions include/linux/netdevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -1955,6 +1955,7 @@ enum netdev_reg_state {
*
* @sysfs_rx_queue_group: Space for optional per-rx queue attributes
* @rtnl_link_ops: Rtnl_link_ops
* @stat_ops: Optional ops for queue-aware statistics
*
* @gso_max_size: Maximum size of generic segmentation offload
* @tso_max_size: Device (as in HW) limit on the max TSO request size
Expand Down Expand Up @@ -2335,6 +2336,8 @@ struct net_device {

const struct rtnl_link_ops *rtnl_link_ops;

const struct netdev_stat_ops *stat_ops;

/* for setting kernel sock attribute on TCP connection setup */
#define GSO_MAX_SEGS 65535u
#define GSO_LEGACY_MAX_SIZE 65536u
Expand Down
54 changes: 54 additions & 0 deletions include/net/netdev_queues.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,60 @@

#include <linux/netdevice.h>

struct netdev_queue_stats_rx {
u64 bytes;
u64 packets;
};

struct netdev_queue_stats_tx {
u64 bytes;
u64 packets;
};

/**
* struct netdev_stat_ops - netdev ops for fine grained stats
* @get_queue_stats_rx: get stats for a given Rx queue
* @get_queue_stats_tx: get stats for a given Tx queue
* @get_base_stats: get base stats (not belonging to any live instance)
*
* Query stats for a given object. The values of the statistics are undefined
* on entry (specifically they are *not* zero-initialized). Drivers should
* assign values only to the statistics they collect. Statistics which are not
* collected must be left undefined.
*
* Queue objects are not necessarily persistent, and only currently active
* queues are queried by the per-queue callbacks. This means that per-queue
* statistics will not generally add up to the total number of events for
* the device. The @get_base_stats callback allows filling in the delta
* between events for currently live queues and overall device history.
* When the statistics for the entire device are queried, first @get_base_stats
* is issued to collect the delta, and then a series of per-queue callbacks.
* Only statistics which are set in @get_base_stats will be reported
* at the device level, meaning that unlike in queue callbacks, setting
* a statistic to zero in @get_base_stats is a legitimate thing to do.
* This is because @get_base_stats has a second function of designating which
* statistics are in fact correct for the entire device (e.g. when history
* for some of the events is not maintained, and reliable "total" cannot
* be provided).
*
* Device drivers can assume that when collecting total device stats,
* the @get_base_stats and subsequent per-queue calls are performed
* "atomically" (without releasing the rtnl_lock).
*
* Device drivers are encouraged to reset the per-queue statistics when
* number of queues change. This is because the primary use case for
* per-queue statistics is currently to detect traffic imbalance.
*/
struct netdev_stat_ops {
void (*get_queue_stats_rx)(struct net_device *dev, int idx,
struct netdev_queue_stats_rx *stats);
void (*get_queue_stats_tx)(struct net_device *dev, int idx,
struct netdev_queue_stats_tx *stats);
void (*get_base_stats)(struct net_device *dev,
struct netdev_queue_stats_rx *rx,
struct netdev_queue_stats_tx *tx);
};

/**
* DOC: Lockless queue stopping / waking helpers.
*
Expand Down
19 changes: 19 additions & 0 deletions include/uapi/linux/netdev.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ enum netdev_queue_type {
NETDEV_QUEUE_TYPE_TX,
};

enum netdev_qstats_scope {
NETDEV_QSTATS_SCOPE_QUEUE = 1,
};

enum {
NETDEV_A_DEV_IFINDEX = 1,
NETDEV_A_DEV_PAD,
Expand Down Expand Up @@ -132,6 +136,20 @@ enum {
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
};

enum {
NETDEV_A_QSTATS_IFINDEX = 1,
NETDEV_A_QSTATS_QUEUE_TYPE,
NETDEV_A_QSTATS_QUEUE_ID,
NETDEV_A_QSTATS_SCOPE,
NETDEV_A_QSTATS_RX_PACKETS = 8,
NETDEV_A_QSTATS_RX_BYTES,
NETDEV_A_QSTATS_TX_PACKETS,
NETDEV_A_QSTATS_TX_BYTES,

__NETDEV_A_QSTATS_MAX,
NETDEV_A_QSTATS_MAX = (__NETDEV_A_QSTATS_MAX - 1)
};

enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
Expand All @@ -144,6 +162,7 @@ enum {
NETDEV_CMD_PAGE_POOL_STATS_GET,
NETDEV_CMD_QUEUE_GET,
NETDEV_CMD_NAPI_GET,
NETDEV_CMD_QSTATS_GET,

__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
Expand Down
12 changes: 12 additions & 0 deletions net/core/netdev-genl-gen.c
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,11 @@ static const struct nla_policy netdev_napi_get_dump_nl_policy[NETDEV_A_NAPI_IFIN
[NETDEV_A_NAPI_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
};

/* NETDEV_CMD_QSTATS_GET - dump */
static const struct nla_policy netdev_qstats_get_nl_policy[NETDEV_A_QSTATS_SCOPE + 1] = {
[NETDEV_A_QSTATS_SCOPE] = NLA_POLICY_MASK(NLA_UINT, 0x1),
};

/* Ops table for netdev */
static const struct genl_split_ops netdev_nl_ops[] = {
{
Expand Down Expand Up @@ -138,6 +143,13 @@ static const struct genl_split_ops netdev_nl_ops[] = {
.maxattr = NETDEV_A_NAPI_IFINDEX,
.flags = GENL_CMD_CAP_DUMP,
},
{
.cmd = NETDEV_CMD_QSTATS_GET,
.dumpit = netdev_nl_qstats_get_dumpit,
.policy = netdev_qstats_get_nl_policy,
.maxattr = NETDEV_A_QSTATS_SCOPE,
.flags = GENL_CMD_CAP_DUMP,
},
};

static const struct genl_multicast_group netdev_nl_mcgrps[] = {
Expand Down
2 changes: 2 additions & 0 deletions net/core/netdev-genl-gen.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ int netdev_nl_queue_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);
int netdev_nl_napi_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_napi_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);

enum {
NETDEV_NLGRP_MGMT,
Expand Down
Loading

0 comments on commit ab63a23

Please sign in to comment.