Skip to content

Commit

Permalink
ethtool: provide customized dim profile management
Browse files Browse the repository at this point in the history
The NetDIM library, currently leveraged by an array of NICs, delivers
excellent acceleration benefits. Nevertheless, NICs vary significantly
in their dim profile list prerequisites.

Specifically, virtio-net backends may present diverse sw or hw device
implementation, making a one-size-fits-all parameter list impractical.
On Alibaba Cloud, the virtio DPU's performance under the default DIM
profile falls short of expectations, partly due to a mismatch in
parameter configuration.

I also noticed that ice/idpf/ena and other NICs have customized
profilelist or placed some restrictions on dim capabilities.

Motivated by this, I tried adding new params for "ethtool -C" that provides
a per-device control to modify and access a device's interrupt parameters.

Usage
========
The target NIC is named ethx.

Assume that ethx only declares support for rx profile setting
(with DIM_PROFILE_RX flag set in profile_flags) and supports modification
of usec and pkt fields.

1. Query the currently customized list of the device

$ ethtool -c ethx
...
rx-profile:
{.usec =   1, .pkts = 256, .comps = n/a,},
{.usec =   8, .pkts = 256, .comps = n/a,},
{.usec =  64, .pkts = 256, .comps = n/a,},
{.usec = 128, .pkts = 256, .comps = n/a,},
{.usec = 256, .pkts = 256, .comps = n/a,}
tx-profile:   n/a

2. Tune
$ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n
"n" means do not modify this field.
$ ethtool -c ethx
...
rx-profile:
{.usec =   1, .pkts =   1, .comps = n/a,},
{.usec =   2, .pkts = 256, .comps = n/a,},
{.usec =   3, .pkts =   3, .comps = n/a,},
{.usec =   4, .pkts =   4, .comps = n/a,},
{.usec = 256, .pkts =   5, .comps = n/a,}
tx-profile:   n/a

3. Hint
If the device does not support some type of customized dim profiles,
the corresponding "n/a" will display.

If the "n/a" field is being modified, -EOPNOTSUPP will be reported.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Heng Qi authored and Jakub Kicinski committed Jun 26, 2024
1 parent b65e697 commit f750dfe
Show file tree
Hide file tree
Showing 10 changed files with 509 additions and 3 deletions.
31 changes: 31 additions & 0 deletions Documentation/netlink/specs/ethtool.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,26 @@ attribute-sets:
name: combined-count
type: u32

-
name: irq-moderation
attributes:
-
name: usec
type: u32
-
name: pkts
type: u32
-
name: comps
type: u32
-
name: profile
attributes:
-
name: irq-moderation
type: nest
multi-attr: true
nested-attributes: irq-moderation
-
name: coalesce
attributes:
Expand Down Expand Up @@ -502,6 +522,15 @@ attribute-sets:
-
name: tx-aggr-time-usecs
type: u32
-
name: rx-profile
type: nest
nested-attributes: profile
-
name: tx-profile
type: nest
nested-attributes: profile

-
name: pause-stat
attributes:
Expand Down Expand Up @@ -1325,6 +1354,8 @@ operations:
- tx-aggr-max-bytes
- tx-aggr-max-frames
- tx-aggr-time-usecs
- rx-profile
- tx-profile
dump: *coalesce-get-op
-
name: coalesce-set
Expand Down
8 changes: 8 additions & 0 deletions Documentation/networking/ethtool-netlink.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1033,6 +1033,8 @@ Kernel response contents:
``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES`` u32 max aggr size, Tx
``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES`` u32 max aggr packets, Tx
``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx
``ETHTOOL_A_COALESCE_RX_PROFILE`` nested profile of DIM, Rx
``ETHTOOL_A_COALESCE_TX_PROFILE`` nested profile of DIM, Tx
=========================================== ====== =======================

Attributes are only included in reply if their value is not zero or the
Expand Down Expand Up @@ -1062,6 +1064,10 @@ block should be sent.
This feature is mainly of interest for specific USB devices which does not cope
well with frequent small-sized URBs transmissions.

``ETHTOOL_A_COALESCE_RX_PROFILE`` and ``ETHTOOL_A_COALESCE_TX_PROFILE`` refer
to DIM parameters, see `Generic Network Dynamic Interrupt Moderation (Net DIM)
<https://www.kernel.org/doc/Documentation/networking/net_dim.rst>`_.

COALESCE_SET
============

Expand Down Expand Up @@ -1098,6 +1104,8 @@ Request contents:
``ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES`` u32 max aggr size, Tx
``ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES`` u32 max aggr packets, Tx
``ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS`` u32 time (us), aggr, Tx
``ETHTOOL_A_COALESCE_RX_PROFILE`` nested profile of DIM, Rx
``ETHTOOL_A_COALESCE_TX_PROFILE`` nested profile of DIM, Tx
=========================================== ====== =======================

Request is rejected if it attributes declared as unsupported by driver (i.e.
Expand Down
42 changes: 42 additions & 0 deletions Documentation/networking/net_dim.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,48 @@ usage is not complete but it should make the outline of the usage clear.
...
}
Tuning DIM
==========

Net DIM serves a range of network devices and delivers excellent acceleration
benefits. Yet, it has been observed that some preset configurations of DIM may
not align seamlessly with the varying specifications of network devices, and
this discrepancy has been identified as a factor to the suboptimal performance
outcomes of DIM-enabled network devices, related to a mismatch in profiles.

To address this issue, Net DIM introduces a per-device control to modify and
access a device's ``rx-profile`` and ``tx-profile`` parameters:
Assume that the target network device is named ethx, and ethx only declares
support for RX profile setting and supports modification of ``usec`` field
and ``pkts`` field (See the data structure:
:c:type:`struct dim_cq_moder <dim_cq_moder>`).

You can use ethtool to modify the current RX DIM profile where all
values are 64::

$ ethtool -C ethx rx-profile 1,1,n_2,2,n_3,n,n_n,4,n_n,n,n

``n`` means do not modify this field, and ``_`` separates structure
elements of the profile array.

Querying the current profiles using::

$ ethtool -c ethx
...
rx-profile:
{.usec = 1, .pkts = 1, .comps = n/a,},
{.usec = 2, .pkts = 2, .comps = n/a,},
{.usec = 3, .pkts = 64, .comps = n/a,},
{.usec = 64, .pkts = 4, .comps = n/a,},
{.usec = 64, .pkts = 64, .comps = n/a,}
tx-profile: n/a

If the network device does not support specific fields of DIM profiles,
the corresponding ``n/a`` will display. If the ``n/a`` field is being
modified, error messages will be reported.


Dynamic Interrupt Moderation (DIM) library API
==============================================

Expand Down
58 changes: 58 additions & 0 deletions include/linux/dim.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
#include <linux/types.h>
#include <linux/workqueue.h>

struct net_device;

/* Number of DIM profiles and period mode. */
#define NET_DIM_PARAMS_NUM_PROFILES 5
#define NET_DIM_DEFAULT_RX_CQ_PKTS_FROM_EQE 256
Expand Down Expand Up @@ -45,12 +47,45 @@
* @pkts: CQ packet counter suggestion (by DIM)
* @comps: Completion counter
* @cq_period_mode: CQ period count mode (from CQE/EQE)
* @rcu: for asynchronous kfree_rcu
*/
struct dim_cq_moder {
u16 usec;
u16 pkts;
u16 comps;
u8 cq_period_mode;
struct rcu_head rcu;
};

#define DIM_PROFILE_RX BIT(0) /* support rx profile modification */
#define DIM_PROFILE_TX BIT(1) /* support tx profile modification */

#define DIM_COALESCE_USEC BIT(0) /* support usec field modification */
#define DIM_COALESCE_PKTS BIT(1) /* support pkts field modification */
#define DIM_COALESCE_COMPS BIT(2) /* support comps field modification */

/**
* struct dim_irq_moder - Structure for irq moderation information.
* Used to collect irq moderation related information.
*
* @profile_flags: DIM_PROFILE_*
* @coal_flags: DIM_COALESCE_* for Rx and Tx
* @dim_rx_mode: Rx DIM period count mode: CQE or EQE
* @dim_tx_mode: Tx DIM period count mode: CQE or EQE
* @rx_profile: DIM profile list for Rx
* @tx_profile: DIM profile list for Tx
* @rx_dim_work: Rx DIM worker scheduled by net_dim()
* @tx_dim_work: Tx DIM worker scheduled by net_dim()
*/
struct dim_irq_moder {
u8 profile_flags;
u8 coal_flags;
u8 dim_rx_mode;
u8 dim_tx_mode;
struct dim_cq_moder __rcu *rx_profile;
struct dim_cq_moder __rcu *tx_profile;
void (*rx_dim_work)(struct work_struct *work);
void (*tx_dim_work)(struct work_struct *work);
};

/**
Expand Down Expand Up @@ -198,6 +233,29 @@ enum dim_step_result {
DIM_ON_EDGE,
};

/**
* net_dim_init_irq_moder - collect information to initialize irq moderation
* @dev: target network device
* @profile_flags: Rx or Tx profile modification capability
* @coal_flags: irq moderation params flags
* @rx_mode: CQ period mode for Rx
* @tx_mode: CQ period mode for Tx
* @rx_dim_work: Rx worker called after dim decision
* @tx_dim_work: Tx worker called after dim decision
*
* Return: 0 on success or a negative error code.
*/
int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
u8 coal_flags, u8 rx_mode, u8 tx_mode,
void (*rx_dim_work)(struct work_struct *work),
void (*tx_dim_work)(struct work_struct *work));

/**
* net_dim_free_irq_moder - free fields for irq moderation
* @dev: target network device
*/
void net_dim_free_irq_moder(struct net_device *dev);

/**
* dim_on_top - check if current state is a good place to stop (top location)
* @dim: DIM context
Expand Down
4 changes: 3 additions & 1 deletion include/linux/ethtool.h
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,9 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32,
#define ETHTOOL_COALESCE_TX_AGGR_MAX_BYTES BIT(24)
#define ETHTOOL_COALESCE_TX_AGGR_MAX_FRAMES BIT(25)
#define ETHTOOL_COALESCE_TX_AGGR_TIME_USECS BIT(26)
#define ETHTOOL_COALESCE_ALL_PARAMS GENMASK(26, 0)
#define ETHTOOL_COALESCE_RX_PROFILE BIT(27)
#define ETHTOOL_COALESCE_TX_PROFILE BIT(28)
#define ETHTOOL_COALESCE_ALL_PARAMS GENMASK(28, 0)

#define ETHTOOL_COALESCE_USECS \
(ETHTOOL_COALESCE_RX_USECS | ETHTOOL_COALESCE_TX_USECS)
Expand Down
3 changes: 3 additions & 0 deletions include/linux/netdevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -2402,6 +2402,9 @@ struct net_device {
/** @page_pools: page pools created for this netdevice */
struct hlist_head page_pools;
#endif

/** @irq_moder: dim parameters used if IS_ENABLED(CONFIG_DIMLIB). */
struct dim_irq_moder *irq_moder;
};
#define to_net_dev(d) container_of(d, struct net_device, dev)

Expand Down
22 changes: 22 additions & 0 deletions include/uapi/linux/ethtool_netlink.h
Original file line number Diff line number Diff line change
Expand Up @@ -415,12 +415,34 @@ enum {
ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES, /* u32 */
ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES, /* u32 */
ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS, /* u32 */
/* nest - _A_PROFILE_IRQ_MODERATION */
ETHTOOL_A_COALESCE_RX_PROFILE,
/* nest - _A_PROFILE_IRQ_MODERATION */
ETHTOOL_A_COALESCE_TX_PROFILE,

/* add new constants above here */
__ETHTOOL_A_COALESCE_CNT,
ETHTOOL_A_COALESCE_MAX = (__ETHTOOL_A_COALESCE_CNT - 1)
};

enum {
ETHTOOL_A_PROFILE_UNSPEC,
/* nest, _A_IRQ_MODERATION_* */
ETHTOOL_A_PROFILE_IRQ_MODERATION,
__ETHTOOL_A_PROFILE_CNT,
ETHTOOL_A_PROFILE_MAX = (__ETHTOOL_A_PROFILE_CNT - 1)
};

enum {
ETHTOOL_A_IRQ_MODERATION_UNSPEC,
ETHTOOL_A_IRQ_MODERATION_USEC, /* u32 */
ETHTOOL_A_IRQ_MODERATION_PKTS, /* u32 */
ETHTOOL_A_IRQ_MODERATION_COMPS, /* u32 */

__ETHTOOL_A_IRQ_MODERATION_CNT,
ETHTOOL_A_IRQ_MODERATION_MAX = (__ETHTOOL_A_IRQ_MODERATION_CNT - 1)
};

/* PAUSE */

enum {
Expand Down
70 changes: 70 additions & 0 deletions lib/dim/net_dim.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
*/

#include <linux/dim.h>
#include <linux/rtnetlink.h>

/*
* Net DIM profiles:
Expand Down Expand Up @@ -95,6 +96,75 @@ net_dim_get_def_tx_moderation(u8 cq_period_mode)
}
EXPORT_SYMBOL(net_dim_get_def_tx_moderation);

int net_dim_init_irq_moder(struct net_device *dev, u8 profile_flags,
u8 coal_flags, u8 rx_mode, u8 tx_mode,
void (*rx_dim_work)(struct work_struct *work),
void (*tx_dim_work)(struct work_struct *work))
{
struct dim_cq_moder *rxp = NULL, *txp;
struct dim_irq_moder *moder;
int len;

dev->irq_moder = kzalloc(sizeof(*dev->irq_moder), GFP_KERNEL);
if (!dev->irq_moder)
return -ENOMEM;

moder = dev->irq_moder;
len = NET_DIM_PARAMS_NUM_PROFILES * sizeof(*moder->rx_profile);

moder->coal_flags = coal_flags;
moder->profile_flags = profile_flags;

if (profile_flags & DIM_PROFILE_RX) {
moder->rx_dim_work = rx_dim_work;
moder->dim_rx_mode = rx_mode;
rxp = kmemdup(rx_profile[rx_mode], len, GFP_KERNEL);
if (!rxp)
goto free_moder;

rcu_assign_pointer(moder->rx_profile, rxp);
}

if (profile_flags & DIM_PROFILE_TX) {
moder->tx_dim_work = tx_dim_work;
moder->dim_tx_mode = tx_mode;
txp = kmemdup(tx_profile[tx_mode], len, GFP_KERNEL);
if (!txp)
goto free_rxp;

rcu_assign_pointer(moder->tx_profile, txp);
}

return 0;

free_rxp:
kfree(rxp);
free_moder:
kfree(moder);
return -ENOMEM;
}
EXPORT_SYMBOL(net_dim_init_irq_moder);

/* RTNL lock is held. */
void net_dim_free_irq_moder(struct net_device *dev)
{
struct dim_cq_moder *rxp, *txp;

if (!dev->irq_moder)
return;

rxp = rtnl_dereference(dev->irq_moder->rx_profile);
txp = rtnl_dereference(dev->irq_moder->tx_profile);

rcu_assign_pointer(dev->irq_moder->rx_profile, NULL);
rcu_assign_pointer(dev->irq_moder->tx_profile, NULL);

kfree_rcu(rxp, rcu);
kfree_rcu(txp, rcu);
kfree(dev->irq_moder);
}
EXPORT_SYMBOL(net_dim_free_irq_moder);

static int net_dim_step(struct dim *dim)
{
if (dim->tired == (NET_DIM_PARAMS_NUM_PROFILES * 2))
Expand Down
1 change: 1 addition & 0 deletions net/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,7 @@ config FAILOVER

config ETHTOOL_NETLINK
bool "Netlink interface for ethtool"
select DIMLIB
default y
help
An alternative userspace interface for ethtool based on generic
Expand Down
Loading

0 comments on commit f750dfe

Please sign in to comment.