Skip to content

Commit

Permalink
Merge branch 'devlink-rate-objects'
Browse files Browse the repository at this point in the history
Dmytro Linkin says:

====================
devlink: rate objects API

Resending without RFC.

Currently kernel provides a way to change tx rate of single VF in
switchdev mode via tc-police action. When lots of VFs are configured
management of theirs rates becomes non-trivial task and some grouping
mechanism is required. Implementing such grouping in tc-police will bring
flow related limitations and unwanted complications, like:
- tc-police is a policer and there is a user request for a traffic
  shaper, so shared tc-police action is not suitable;
- flows requires net device to be placed on, means "groups" wouldn't
  have net device instance itself. Taking into the account previous
  point was reviewed a sollution, when representor have a policer and
  the driver use a shaper if qdisc contains group of VFs - such approach
  ugly, compilated and misleading;
- TC is ingress only, while configuring "other" side of the wire looks
  more like a "real" picture where shaping is outside of the steering
  world, similar to "ip link" command;

According to that devlink is the most appropriate place.

This series introduces devlink API for managing tx rate of single devlink
port or of a group by invoking callbacks (see below) of corresponding
driver. Also devlink port or a group can be added to the parent group,
where driver responsible to handle rates of a group elements. To achieve
all of that new rate object is added. It can be one of the two types:
- leaf - represents a single devlink port; created/destroyed by the
  driver and bound to the devlink port. As example, some driver may
  create leaf rate object for every devlink port associated with VF.
  Since leaf have 1to1 mapping to it's devlink port, in user space it is
  referred as pci/<bus_addr>/<port_index>;
- node - represents a group of rate objects; created/deleted by request
  from the userspace; initially empty (no rate objects added). In
  userspace it is referred as pci/<bus_addr>/<node_name>, where node name
  can be any, except decimal number, to avoid collisions with leafs.

devlink_ops extended with following callbacks:
- rate_{leaf|node}_tx_{share|max}_set
- rate_node_{new|del}
- rate_{leaf|node}_parent_set

KAPI provides:
- creation/destruction of the leaf rate object associated with devlink
  port
- destruction of rate nodes to allow a vendor driver to free allocated
  resources on driver removal or due to the other reasons when nodes
  destruction required

UAPI provides:
- dumping all or single rate objects
- setting tx_{share|max} of rate object of any type
- creating/deleting node rate object
- setting/unsetting parent of any rate object

Added devlink rate object support for netdevsim driver

Issues/open questions:
- Does user need DEVLINK_CMD_RATE_DEL_ALL_CHILD command to clean all
  children of particular parent node? For example:
  $ devlink port function rate flush netdevsim/netdevsim10/group
- priv pointer passed to the callbacks is a source of bugs; in leaf case
  driver can embed rate object into internal structure and use
  container_of() on it; in node case it cannot be done since nodes are
  created from userspace

v1->v2:
- fixed kernel-doc for devlink_rate_leaf_{create|destroy}()
- s/func/function/ for all devlink port command occurences

v2->v3:
- devlink:
  - added devlink_rate_nodes_destroy() function
- netdevsim:
  - added call of devlink_rate_nodes_destroy() function
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jun 2, 2021
2 parents 53c7bb5 + b62767e commit 270d47d
Show file tree
Hide file tree
Showing 10 changed files with 1,564 additions and 57 deletions.
35 changes: 35 additions & 0 deletions Documentation/networking/devlink/devlink-port.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,41 @@ device to instantiate the subfunction device on particular PCI function.
A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`.
At this point a matching subfunction driver binds to the subfunction's auxiliary device.

Rate object management
======================

Devlink provides API to manage tx rates of single devlink port or a group.
This is done through rate objects, which can be one of the two types:

``leaf``
Represents a single devlink port; created/destroyed by the driver. Since leaf
have 1to1 mapping to its devlink port, in user space it is referred as
``pci/<bus_addr>/<port_index>``;

``node``
Represents a group of rate objects (leafs and/or nodes); created/deleted by
request from the userspace; initially empty (no rate objects added). In
userspace it is referred as ``pci/<bus_addr>/<node_name>``, where
``node_name`` can be any identifier, except decimal number, to avoid
collisions with leafs.

API allows to configure following rate object's parameters:

``tx_share``
Minimum TX rate value shared among all other rate objects, or rate objects
that parts of the parent group, if it is a part of the same group.

``tx_max``
Maximum TX rate value.

``parent``
Parent node name. Parent node rate limits are considered as additional limits
to all node children limits. ``tx_max`` is an upper limit for children.
``tx_share`` is a total bandwidth distributed among children.

Driver implementations are allowed to support both or either rate object types
and setting methods of their parameters.

Terms and Definitions
=====================

Expand Down
26 changes: 26 additions & 0 deletions Documentation/networking/devlink/netdevsim.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,32 @@ entries, FIB rule entries and nexthops that the driver will allow.
$ devlink resource set netdevsim/netdevsim0 path /nexthops size 16
$ devlink dev reload netdevsim/netdevsim0
Rate objects
============

The ``netdevsim`` driver supports rate objects management, which includes:

- registerging/unregistering leaf rate objects per VF devlink port;
- creation/deletion node rate objects;
- setting tx_share and tx_max rate values for any rate object type;
- setting parent node for any rate object type.

Rate nodes and it's parameters are exposed in ``netdevsim`` debugfs in RO mode.
For example created rate node with name ``some_group``:

.. code:: shell
$ ls /sys/kernel/debug/netdevsim/netdevsim0/rate_groups/some_group
rate_parent tx_max tx_share
Same parameters are exposed for leaf objects in corresponding ports directories.
For ex.:

.. code:: shell
$ ls /sys/kernel/debug/netdevsim/netdevsim0/ports/1
dev ethtool rate_parent tx_max tx_share
Driver-specific Traps
=====================

Expand Down
129 changes: 114 additions & 15 deletions drivers/net/netdevsim/bus.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,21 +27,34 @@ static struct nsim_bus_dev *to_nsim_bus_dev(struct device *dev)
static int nsim_bus_dev_vfs_enable(struct nsim_bus_dev *nsim_bus_dev,
unsigned int num_vfs)
{
nsim_bus_dev->vfconfigs = kcalloc(num_vfs,
sizeof(struct nsim_vf_config),
GFP_KERNEL | __GFP_NOWARN);
struct nsim_dev *nsim_dev;
int err = 0;

if (nsim_bus_dev->max_vfs < num_vfs)
return -ENOMEM;

if (!nsim_bus_dev->vfconfigs)
return -ENOMEM;
nsim_bus_dev->num_vfs = num_vfs;

return 0;
nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
if (nsim_esw_mode_is_switchdev(nsim_dev)) {
err = nsim_esw_switchdev_enable(nsim_dev, NULL);
if (err)
nsim_bus_dev->num_vfs = 0;
}

return err;
}

static void nsim_bus_dev_vfs_disable(struct nsim_bus_dev *nsim_bus_dev)
void nsim_bus_dev_vfs_disable(struct nsim_bus_dev *nsim_bus_dev)
{
kfree(nsim_bus_dev->vfconfigs);
nsim_bus_dev->vfconfigs = NULL;
struct nsim_dev *nsim_dev;

nsim_bus_dev->num_vfs = 0;
nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
if (nsim_esw_mode_is_switchdev(nsim_dev))
nsim_esw_legacy_enable(nsim_dev, NULL);
}

static ssize_t
Expand All @@ -56,7 +69,7 @@ nsim_bus_dev_numvfs_store(struct device *dev, struct device_attribute *attr,
if (ret)
return ret;

rtnl_lock();
mutex_lock(&nsim_bus_dev->vfs_lock);
if (nsim_bus_dev->num_vfs == num_vfs)
goto exit_good;
if (nsim_bus_dev->num_vfs && num_vfs) {
Expand All @@ -74,7 +87,7 @@ nsim_bus_dev_numvfs_store(struct device *dev, struct device_attribute *attr,
exit_good:
ret = count;
exit_unlock:
rtnl_unlock();
mutex_unlock(&nsim_bus_dev->vfs_lock);

return ret;
}
Expand All @@ -92,6 +105,79 @@ static struct device_attribute nsim_bus_dev_numvfs_attr =
__ATTR(sriov_numvfs, 0664, nsim_bus_dev_numvfs_show,
nsim_bus_dev_numvfs_store);

ssize_t nsim_bus_dev_max_vfs_read(struct file *file,
char __user *data,
size_t count, loff_t *ppos)
{
struct nsim_bus_dev *nsim_bus_dev = file->private_data;
char buf[11];
size_t len;

len = snprintf(buf, sizeof(buf), "%u\n", nsim_bus_dev->max_vfs);
if (len < 0)
return len;

return simple_read_from_buffer(data, count, ppos, buf, len);
}

ssize_t nsim_bus_dev_max_vfs_write(struct file *file,
const char __user *data,
size_t count, loff_t *ppos)
{
struct nsim_bus_dev *nsim_bus_dev = file->private_data;
struct nsim_vf_config *vfconfigs;
ssize_t ret;
char buf[10];
u32 val;

if (*ppos != 0)
return 0;

if (count >= sizeof(buf))
return -ENOSPC;

mutex_lock(&nsim_bus_dev->vfs_lock);
/* Reject if VFs are configured */
if (nsim_bus_dev->num_vfs) {
ret = -EBUSY;
goto unlock;
}

ret = copy_from_user(buf, data, count);
if (ret) {
ret = -EFAULT;
goto unlock;
}

buf[count] = '\0';
ret = kstrtouint(buf, 10, &val);
if (ret) {
ret = -EIO;
goto unlock;
}

/* max_vfs limited by the maximum number of provided port indexes */
if (val > NSIM_DEV_VF_PORT_INDEX_MAX - NSIM_DEV_VF_PORT_INDEX_BASE) {
ret = -ERANGE;
goto unlock;
}

vfconfigs = kcalloc(val, sizeof(struct nsim_vf_config), GFP_KERNEL | __GFP_NOWARN);
if (!vfconfigs) {
ret = -ENOMEM;
goto unlock;
}

kfree(nsim_bus_dev->vfconfigs);
nsim_bus_dev->vfconfigs = vfconfigs;
nsim_bus_dev->max_vfs = val;
*ppos += count;
ret = count;
unlock:
mutex_unlock(&nsim_bus_dev->vfs_lock);
return ret;
}

static ssize_t
new_port_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
Expand All @@ -113,7 +199,7 @@ new_port_store(struct device *dev, struct device_attribute *attr,

mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
devlink_reload_disable(devlink);
ret = nsim_dev_port_add(nsim_bus_dev, port_index);
ret = nsim_dev_port_add(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
devlink_reload_enable(devlink);
mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
return ret ? ret : count;
Expand Down Expand Up @@ -142,7 +228,7 @@ del_port_store(struct device *dev, struct device_attribute *attr,

mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
devlink_reload_disable(devlink);
ret = nsim_dev_port_del(nsim_bus_dev, port_index);
ret = nsim_dev_port_del(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
devlink_reload_enable(devlink);
mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
return ret ? ret : count;
Expand All @@ -168,9 +254,6 @@ static const struct attribute_group *nsim_bus_dev_attr_groups[] = {

static void nsim_bus_dev_release(struct device *dev)
{
struct nsim_bus_dev *nsim_bus_dev = to_nsim_bus_dev(dev);

nsim_bus_dev_vfs_disable(nsim_bus_dev);
}

static struct device_type nsim_bus_dev_type = {
Expand Down Expand Up @@ -311,6 +394,8 @@ static struct bus_type nsim_bus = {
.num_vf = nsim_num_vf,
};

#define NSIM_BUS_DEV_MAX_VFS 4

static struct nsim_bus_dev *
nsim_bus_dev_new(unsigned int id, unsigned int port_count)
{
Expand All @@ -329,15 +414,28 @@ nsim_bus_dev_new(unsigned int id, unsigned int port_count)
nsim_bus_dev->dev.type = &nsim_bus_dev_type;
nsim_bus_dev->port_count = port_count;
nsim_bus_dev->initial_net = current->nsproxy->net_ns;
nsim_bus_dev->max_vfs = NSIM_BUS_DEV_MAX_VFS;
mutex_init(&nsim_bus_dev->nsim_bus_reload_lock);
mutex_init(&nsim_bus_dev->vfs_lock);
/* Disallow using nsim_bus_dev */
smp_store_release(&nsim_bus_dev->init, false);

nsim_bus_dev->vfconfigs = kcalloc(nsim_bus_dev->max_vfs,
sizeof(struct nsim_vf_config),
GFP_KERNEL | __GFP_NOWARN);
if (!nsim_bus_dev->vfconfigs) {
err = -ENOMEM;
goto err_nsim_bus_dev_id_free;
}

err = device_register(&nsim_bus_dev->dev);
if (err)
goto err_nsim_bus_dev_id_free;
goto err_nsim_vfs_free;

return nsim_bus_dev;

err_nsim_vfs_free:
kfree(nsim_bus_dev->vfconfigs);
err_nsim_bus_dev_id_free:
ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id);
err_nsim_bus_dev_free:
Expand All @@ -351,6 +449,7 @@ static void nsim_bus_dev_del(struct nsim_bus_dev *nsim_bus_dev)
smp_store_release(&nsim_bus_dev->init, false);
device_unregister(&nsim_bus_dev->dev);
ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id);
kfree(nsim_bus_dev->vfconfigs);
kfree(nsim_bus_dev);
}

Expand Down
Loading

0 comments on commit 270d47d

Please sign in to comment.