Skip to content

Commit

Permalink
Merge branch 'mlx5-connectx-4-sriov'
Browse files Browse the repository at this point in the history
Or Gerlitz says:

====================
Introducing ConnectX-4 Ethernet SRIOV

This patchset introduces the support of Ethernet SRIOV in ConnectX-4
family of 100G Ethernet NICs.

Some features are still missing, but all the basic SRIOV functionalities
are there already.

Basic Introduction:
ConnectX-4 HW architecture provides two kinds of underlying HW switches.

MPFS (Multi Physical Function Switch) or L2 Table in Software terms:

The HCA has one MPFS switch per physical port, this switch is responsible
of forwarding Unicast traffic to the various overlying Physical Functions (PFs).
Multicast traffic is flooded amongst all the PFs, Each PF can request to
forward a unicast MAC to its E-Switch Uplink vport (which we will cover later)
through SET_L2_TABLE_ENTRY HW command.

MPFS has five ports, four are connected to PFs (one for each) and one is connected
directly to the Physical Port (Physical Link).

E-Switch (Ethernet Switch):

The HCA has one per physical function. The main responsibility of this component is
to forward Unicast/Multicast and vlan tagged/untagged traffic to the various
Virtual Functions (VFs) allocated by the PF. Unlike MPFS, the PF needs to explicitly
create the E-Switch FDB table, Which is a HW flow table managed by the PF driver
whenever vport_group_manager capability bit is set for this PF.

E-Switch has Virtual Ports (vports) entities as its ports, vport0 and uplink vport
are special kind of vports that represents PF vport (vport0) and uplink vport which
is connected to the MPFS switch (if exists) as the PF external link.
vport1..vportN represent VF0..VF(N-1) egress/ingress ports.

E-Switch FDB contains forwarding rules such as:
        UC MAC0 -> vport0(PF).
        UC MAC1 -> vport1.
        UC MAC2 -> vport2.
        MC MACX -> vport0, vport2, Uplink.
        MC MACY -> vport1, Uplink.

    For unmatched traffic FDB has the following default rules:
        Unmatched Traffic (src vport != Uplink) -> Uplink.
        Unmatched Traffic (src vport == Uplink) -> vport0(PF).

NIC VPort context:
Each NIC (VF/PF) has its own vport context which will be used to store the current
NIC vport context (UC/MC and vlan lists) and other NIC properties such as MTU, promisc
mode, etc.. NIC (VF/PF) driver is responsible of constantly updating this context.

FDB rules population:
Each NIC vport (VF/PF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.

Both PF and VF use the same driver and submit commands directly to the firmware.
The PF sees the vport_group_manager capability bit and as such runs the code
to populate the embedded switches as explained above.

The patch goes as follows:

Patches 1-2 introduces the basic PCI SRIOV functionalities and the support of
Connectx4 to enable specific VFs via enable/disable HCA commands. These two
patches will be also in use later for the IB SRIOV flow.

Patches 3-8 Introduces the basic E-Switch capabilities and commands to be used later by
VF to modify and update its NIC vport context, and by PF (E-Switch Manager) driver to
Query the VF NIC context and acts accordingly.

Patches 9-10 Provide the needed functionality of a NIC driver VF/PF to support SRIOV,
mainly vport context update support.

Patch 11 ("net/mlx5: Introducing E-Switch and l2 table"), Introduces the basic
E-Switch support and infrastructure to read vport context events and to update
MPFS L2 Table of the UC mac addresses request by the PF.

Patches 12-18 Introduces SRIOV enablemenet and E-Switch FDB table management
It adds the Basic E-Swtich public API to set and get sriov properties to be used
in PF netdev sriov ndos.

Patchset was applied ontop of commit 3f8c0f7 "gianfar: use of_property_read_bool()"

Saeed, Eli and Or.

changes from V0, addressed feedback from Alex Duyck:
 - patch 09, remove the loop to seek the device address
 - patch 09, avoid using array as returned value from helper function
 - patch 10, fix possible buffer over-run

changes from V1, addressed feedback from and Julia Lawall and kbuild test robot
 - patch 11 check the right variable for allocation failure
 - patch 18 eliminated unneeded semicolon
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Dec 3, 2015
2 parents 24e2416 + 66e49de commit c5b6c3e
Show file tree
Hide file tree
Showing 18 changed files with 2,742 additions and 71 deletions.
4 changes: 2 additions & 2 deletions drivers/net/ethernet/mellanox/mlx5/core/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o

mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
mad.o transobj.o vport.o
mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o \
mad.o transobj.o vport.o sriov.o
mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o flow_table.o eswitch.o \
en_main.o en_flow_table.o en_ethtool.o en_tx.o en_rx.o \
en_txrx.o
1 change: 1 addition & 0 deletions drivers/net/ethernet/mellanox/mlx5/core/en.h
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,7 @@ enum {
};

struct mlx5e_vlan_db {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
u32 active_vlans_ft_ix[VLAN_N_VID];
u32 untagged_rule_ft_ix;
u32 any_vlan_rule_ft_ix;
Expand Down
139 changes: 139 additions & 0 deletions drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c
Original file line number Diff line number Diff line change
Expand Up @@ -502,6 +502,49 @@ static int mlx5e_add_eth_addr_rule(struct mlx5e_priv *priv,
return err;
}

static int mlx5e_vport_context_update_vlans(struct mlx5e_priv *priv)
{
struct net_device *ndev = priv->netdev;
int max_list_size;
int list_size;
u16 *vlans;
int vlan;
int err;
int i;

list_size = 0;
for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID)
list_size++;

max_list_size = 1 << MLX5_CAP_GEN(priv->mdev, log_max_vlan_list);

if (list_size > max_list_size) {
netdev_warn(ndev,
"netdev vlans list size (%d) > (%d) max vport list size, some vlans will be dropped\n",
list_size, max_list_size);
list_size = max_list_size;
}

vlans = kcalloc(list_size, sizeof(*vlans), GFP_KERNEL);
if (!vlans)
return -ENOMEM;

i = 0;
for_each_set_bit(vlan, priv->vlan.active_vlans, VLAN_N_VID) {
if (i >= list_size)
break;
vlans[i++] = vlan;
}

err = mlx5_modify_nic_vport_vlans(priv->mdev, vlans, list_size);
if (err)
netdev_err(ndev, "Failed to modify vport vlans list err(%d)\n",
err);

kfree(vlans);
return err;
}

enum mlx5e_vlan_rule_type {
MLX5E_VLAN_RULE_TYPE_UNTAGGED,
MLX5E_VLAN_RULE_TYPE_ANY_VID,
Expand Down Expand Up @@ -552,6 +595,10 @@ static int mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
1);
break;
default: /* MLX5E_VLAN_RULE_TYPE_MATCH_VID */
err = mlx5e_vport_context_update_vlans(priv);
if (err)
goto add_vlan_rule_out;

ft_ix = &priv->vlan.active_vlans_ft_ix[vid];
MLX5_SET(fte_match_param, match_value, outer_headers.vlan_tag,
1);
Expand Down Expand Up @@ -588,6 +635,7 @@ static void mlx5e_del_vlan_rule(struct mlx5e_priv *priv,
case MLX5E_VLAN_RULE_TYPE_MATCH_VID:
mlx5_del_flow_table_entry(priv->ft.vlan,
priv->vlan.active_vlans_ft_ix[vid]);
mlx5e_vport_context_update_vlans(priv);
break;
}
}
Expand Down Expand Up @@ -619,6 +667,8 @@ int mlx5e_vlan_rx_add_vid(struct net_device *dev, __always_unused __be16 proto,
{
struct mlx5e_priv *priv = netdev_priv(dev);

set_bit(vid, priv->vlan.active_vlans);

return mlx5e_add_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);
}

Expand All @@ -627,6 +677,8 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 proto,
{
struct mlx5e_priv *priv = netdev_priv(dev);

clear_bit(vid, priv->vlan.active_vlans);

mlx5e_del_vlan_rule(priv, MLX5E_VLAN_RULE_TYPE_MATCH_VID, vid);

return 0;
Expand Down Expand Up @@ -671,6 +723,91 @@ static void mlx5e_sync_netdev_addr(struct mlx5e_priv *priv)
netif_addr_unlock_bh(netdev);
}

static void mlx5e_fill_addr_array(struct mlx5e_priv *priv, int list_type,
u8 addr_array[][ETH_ALEN], int size)
{
bool is_uc = (list_type == MLX5_NVPRT_LIST_TYPE_UC);
struct net_device *ndev = priv->netdev;
struct mlx5e_eth_addr_hash_node *hn;
struct hlist_head *addr_list;
struct hlist_node *tmp;
int i = 0;
int hi;

addr_list = is_uc ? priv->eth_addr.netdev_uc : priv->eth_addr.netdev_mc;

if (is_uc) /* Make sure our own address is pushed first */
ether_addr_copy(addr_array[i++], ndev->dev_addr);
else if (priv->eth_addr.broadcast_enabled)
ether_addr_copy(addr_array[i++], ndev->broadcast);

mlx5e_for_each_hash_node(hn, tmp, addr_list, hi) {
if (ether_addr_equal(ndev->dev_addr, hn->ai.addr))
continue;
if (i >= size)
break;
ether_addr_copy(addr_array[i++], hn->ai.addr);
}
}

static void mlx5e_vport_context_update_addr_list(struct mlx5e_priv *priv,
int list_type)
{
bool is_uc = (list_type == MLX5_NVPRT_LIST_TYPE_UC);
struct mlx5e_eth_addr_hash_node *hn;
u8 (*addr_array)[ETH_ALEN] = NULL;
struct hlist_head *addr_list;
struct hlist_node *tmp;
int max_size;
int size;
int err;
int hi;

size = is_uc ? 0 : (priv->eth_addr.broadcast_enabled ? 1 : 0);
max_size = is_uc ?
1 << MLX5_CAP_GEN(priv->mdev, log_max_current_uc_list) :
1 << MLX5_CAP_GEN(priv->mdev, log_max_current_mc_list);

addr_list = is_uc ? priv->eth_addr.netdev_uc : priv->eth_addr.netdev_mc;
mlx5e_for_each_hash_node(hn, tmp, addr_list, hi)
size++;

if (size > max_size) {
netdev_warn(priv->netdev,
"netdev %s list size (%d) > (%d) max vport list size, some addresses will be dropped\n",
is_uc ? "UC" : "MC", size, max_size);
size = max_size;
}

if (size) {
addr_array = kcalloc(size, ETH_ALEN, GFP_KERNEL);
if (!addr_array) {
err = -ENOMEM;
goto out;
}
mlx5e_fill_addr_array(priv, list_type, addr_array, size);
}

err = mlx5_modify_nic_vport_mac_list(priv->mdev, list_type, addr_array, size);
out:
if (err)
netdev_err(priv->netdev,
"Failed to modify vport %s list err(%d)\n",
is_uc ? "UC" : "MC", err);
kfree(addr_array);
}

static void mlx5e_vport_context_update(struct mlx5e_priv *priv)
{
struct mlx5e_eth_addr_db *ea = &priv->eth_addr;

mlx5e_vport_context_update_addr_list(priv, MLX5_NVPRT_LIST_TYPE_UC);
mlx5e_vport_context_update_addr_list(priv, MLX5_NVPRT_LIST_TYPE_MC);
mlx5_modify_nic_vport_promisc(priv->mdev, 0,
ea->allmulti_enabled,
ea->promisc_enabled);
}

static void mlx5e_apply_netdev_addr(struct mlx5e_priv *priv)
{
struct mlx5e_eth_addr_hash_node *hn;
Expand Down Expand Up @@ -748,6 +885,8 @@ void mlx5e_set_rx_mode_work(struct work_struct *work)
ea->promisc_enabled = promisc_enabled;
ea->allmulti_enabled = allmulti_enabled;
ea->broadcast_enabled = broadcast_enabled;

mlx5e_vport_context_update(priv);
}

void mlx5e_init_eth_addr(struct mlx5e_priv *priv)
Expand Down
88 changes: 85 additions & 3 deletions drivers/net/ethernet/mellanox/mlx5/core/en_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

#include <linux/mlx5/flow_table.h>
#include "en.h"
#include "eswitch.h"

struct mlx5e_rq_param {
u32 rqc[MLX5_ST_SZ_DW(rqc)];
Expand Down Expand Up @@ -63,7 +64,7 @@ static void mlx5e_update_carrier(struct mlx5e_priv *priv)
u8 port_state;

port_state = mlx5_query_vport_state(mdev,
MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT);
MLX5_QUERY_VPORT_STATE_IN_OP_MOD_VNIC_VPORT, 0);

if (port_state == VPORT_STATE_UP)
netif_carrier_on(priv->netdev);
Expand Down Expand Up @@ -1931,6 +1932,79 @@ static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
return err;
}

static int mlx5e_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;

return mlx5_eswitch_set_vport_mac(mdev->priv.eswitch, vf + 1, mac);
}

static int mlx5e_set_vf_vlan(struct net_device *dev, int vf, u16 vlan, u8 qos)
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;

return mlx5_eswitch_set_vport_vlan(mdev->priv.eswitch, vf + 1,
vlan, qos);
}

static int mlx5_vport_link2ifla(u8 esw_link)
{
switch (esw_link) {
case MLX5_ESW_VPORT_ADMIN_STATE_DOWN:
return IFLA_VF_LINK_STATE_DISABLE;
case MLX5_ESW_VPORT_ADMIN_STATE_UP:
return IFLA_VF_LINK_STATE_ENABLE;
}
return IFLA_VF_LINK_STATE_AUTO;
}

static int mlx5_ifla_link2vport(u8 ifla_link)
{
switch (ifla_link) {
case IFLA_VF_LINK_STATE_DISABLE:
return MLX5_ESW_VPORT_ADMIN_STATE_DOWN;
case IFLA_VF_LINK_STATE_ENABLE:
return MLX5_ESW_VPORT_ADMIN_STATE_UP;
}
return MLX5_ESW_VPORT_ADMIN_STATE_AUTO;
}

static int mlx5e_set_vf_link_state(struct net_device *dev, int vf,
int link_state)
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;

return mlx5_eswitch_set_vport_state(mdev->priv.eswitch, vf + 1,
mlx5_ifla_link2vport(link_state));
}

static int mlx5e_get_vf_config(struct net_device *dev,
int vf, struct ifla_vf_info *ivi)
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;
int err;

err = mlx5_eswitch_get_vport_config(mdev->priv.eswitch, vf + 1, ivi);
if (err)
return err;
ivi->linkstate = mlx5_vport_link2ifla(ivi->linkstate);
return 0;
}

static int mlx5e_get_vf_stats(struct net_device *dev,
int vf, struct ifla_vf_stats *vf_stats)
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;

return mlx5_eswitch_get_vport_stats(mdev->priv.eswitch, vf + 1,
vf_stats);
}

static struct net_device_ops mlx5e_netdev_ops = {
.ndo_open = mlx5e_open,
.ndo_stop = mlx5e_close,
Expand All @@ -1941,7 +2015,7 @@ static struct net_device_ops mlx5e_netdev_ops = {
.ndo_vlan_rx_add_vid = mlx5e_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = mlx5e_vlan_rx_kill_vid,
.ndo_set_features = mlx5e_set_features,
.ndo_change_mtu = mlx5e_change_mtu,
.ndo_change_mtu = mlx5e_change_mtu
};

static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
Expand Down Expand Up @@ -2028,7 +2102,7 @@ static void mlx5e_set_netdev_dev_addr(struct net_device *netdev)
{
struct mlx5e_priv *priv = netdev_priv(netdev);

mlx5_query_nic_vport_mac_address(priv->mdev, netdev->dev_addr);
mlx5_query_nic_vport_mac_address(priv->mdev, 0, netdev->dev_addr);
}

static void mlx5e_build_netdev(struct net_device *netdev)
Expand All @@ -2041,6 +2115,14 @@ static void mlx5e_build_netdev(struct net_device *netdev)
if (priv->params.num_tc > 1)
mlx5e_netdev_ops.ndo_select_queue = mlx5e_select_queue;

if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
mlx5e_netdev_ops.ndo_set_vf_mac = mlx5e_set_vf_mac;
mlx5e_netdev_ops.ndo_set_vf_vlan = mlx5e_set_vf_vlan;
mlx5e_netdev_ops.ndo_get_vf_config = mlx5e_get_vf_config;
mlx5e_netdev_ops.ndo_set_vf_link_state = mlx5e_set_vf_link_state;
mlx5e_netdev_ops.ndo_get_vf_stats = mlx5e_get_vf_stats;
}

netdev->netdev_ops = &mlx5e_netdev_ops;
netdev->watchdog_timeo = 15 * HZ;

Expand Down
13 changes: 13 additions & 0 deletions drivers/net/ethernet/mellanox/mlx5/core/eq.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@
#include <linux/mlx5/driver.h>
#include <linux/mlx5/cmd.h>
#include "mlx5_core.h"
#ifdef CONFIG_MLX5_CORE_EN
#include "eswitch.h"
#endif

enum {
MLX5_EQE_SIZE = sizeof(struct mlx5_eqe),
Expand Down Expand Up @@ -287,6 +290,11 @@ static int mlx5_eq_int(struct mlx5_core_dev *dev, struct mlx5_eq *eq)
break;
#endif

#ifdef CONFIG_MLX5_CORE_EN
case MLX5_EVENT_TYPE_NIC_VPORT_CHANGE:
mlx5_eswitch_vport_event(dev->priv.eswitch, eqe);
break;
#endif
default:
mlx5_core_warn(dev, "Unhandled event 0x%x on EQ 0x%x\n",
eqe->type, eq->eqn);
Expand Down Expand Up @@ -459,6 +467,11 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
if (MLX5_CAP_GEN(dev, pg))
async_event_mask |= (1ull << MLX5_EVENT_TYPE_PAGE_FAULT);

if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH &&
MLX5_CAP_GEN(dev, vport_group_manager) &&
mlx5_core_is_pf(dev))
async_event_mask |= (1ull << MLX5_EVENT_TYPE_NIC_VPORT_CHANGE);

err = mlx5_create_map_eq(dev, &table->cmd_eq, MLX5_EQ_VEC_CMD,
MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
"mlx5_cmd_eq", &dev->priv.uuari.uars[0]);
Expand Down
Loading

0 comments on commit c5b6c3e

Please sign in to comment.