Skip to content

Commit

Permalink
Merge branch 'QED-NVMeTCP-Offload'
Browse files Browse the repository at this point in the history
Shai Malin says:

====================
QED NVMeTCP Offload

Intro:
======
This is the qed part of Marvell’s NVMeTCP offload series, shared as
RFC series "NVMeTCP Offload ULP and QEDN Device Drive".
This part is a standalone series, and is not dependent on other parts
of the RFC.
The overall goal is to add qedn as the offload driver for NVMeTCP,
alongside the existing offload drivers (qedr, qedi and qedf for rdma,
iscsi and fcoe respectively).

In this series we are making the necessary changes to qed to enable this
by exposing APIs for FW/HW initializations.

The qedn series (and required changes to NVMe stack) will be sent to the
linux-nvme mailing list.
I have included more details on the upstream plan under section with the
same name below.

The Series Patches:
===================
1. qed: Add TCP_ULP FW resource layout – replacing iSCSI when common
   with NVMeTCP.
2. qed: Add NVMeTCP Offload PF Level FW and HW HSI.
3. qed: Add NVMeTCP Offload Connection Level FW and HW HSI.
4. qed: Add support of HW filter block – enables redirecting NVMeTCP
   traffic to the dedicated PF.
5. qed: Add NVMeTCP Offload IO Level FW and HW HSI.
6. qed: Add NVMeTCP Offload IO Level FW Initializations.
7. qed: Add IP services APIs support –VLAN, IP routing and reserving
   TCP ports for the offload device.

The NVMeTCP Offload:
====================
With the goal of enabling a generic infrastructure that allows NVMe/TCP
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
patch series introduces the nvme-tcp-offload ULP host layer, which will
be a new transport type called "tcp-offload" and will serve as an
abstraction layer to work with vendor specific nvme-tcp offload drivers.

NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
both the TCP level and the NVMeTCP level.

The nvme-tcp-offload transport can co-exist with the existing tcp and
other transports. The tcp offload was designed so that stack changes are
kept to a bare minimum: only registering new transports.
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path
and those that are not offloaded (even on the same device).

The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

* NVMe layer: *

       [ nvme/nvme-fabrics/blk-mq ]
             |
        (nvme API and blk-mq API)
             |
             |
* Vendor agnostic transport layer: *

      [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
             |        |             |
           (Verbs)
             |        |             |
             |     (Socket)
             |        |             |
             |        |        (nvme-tcp-offload API)
             |        |             |
             |        |             |
* Vendor Specific Driver: *

             |        |             |
           [ qedr ]
                      |             |
                   [ qede ]
                                    |
                                  [ qedn ]

Performance:
============
With this implementation on top of the Marvell qedn driver (using the
Marvell FastLinQ NIC), we were able to demonstrate the following CPU
utilization improvement:

On AMD EPYC 7402, 2.80GHz, 28 cores:
- For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
  Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
  NVMeTCP offload.

On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
- For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
  Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
  NVMeTCP offload.

In addition, we were able to demonstrate the following latency improvement:
- For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
  Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
  with NVMeTCP offload.

  Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
  with NVMeTCP offload.

The end-to-end offload latency was measured from fio while running against
back end of null device.

The Marvell FastLinQ NIC HW engine:
====================================
The Marvell NIC HW engine is capable of offloading the entire TCP/IP
stack and managing up to 64K connections per PF, already implemented and
upstream use cases for this include iWARP (by the Marvell qedr driver)
and iSCSI (by the Marvell qedi driver).
In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
and is able to manage the IO level also in case of TCP re-transmissions
and OOO events.
The HW engine enables direct data placement (including the data digest CRC
calculation and validation) and direct data transmission (including data
digest CRC calculation).

The Marvell qedn driver:
========================
The new driver will be added under "drivers/nvme/hw" and will be enabled
by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
As part of the qedn init, the driver will register as a pci device driver
and will work with the Marvell fastlinQ NIC.
As part of the probe, the driver will register to the nvme_tcp_offload
(ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
"qed_*_ops" which are used by the qede, qedr, qedf and qedi device
drivers.

Upstream Plan:
=============
The RFC series "NVMeTCP Offload ULP and QEDN Device Driver"
https://lore.kernel.org/netdev/20210531225222.16992-1-smalin@marvell.com/
was designed in a modular way so that part 1 (nvme-tcp-offload) and
part 2 (qed) are independent and part 3 (qedn) depends on both parts 1+2.

- Part 1 (RFC patch 1-8): NVMeTCP Offload ULP
  The nvme-tcp-offload patches, will be sent to
  'linux-nvme@lists.infradead.org'.

- Part 2 (RFC patches 9-15): QED NVMeTCP Offload
  The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.

Once part 1 and 2 are accepted:

- Part 3 (RFC patches 16-27): QEDN NVMeTCP Offload
  The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
David S. Miller committed Jun 3, 2021
2 parents 2c95e6c + 806ee7f commit eda1bc6
Show file tree
Hide file tree
Showing 25 changed files with 2,650 additions and 52 deletions.
3 changes: 3 additions & 0 deletions drivers/net/ethernet/qlogic/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@ config QED_RDMA
config QED_ISCSI
bool

config QED_NVMETCP
bool

config QED_FCOE
bool

Expand Down
5 changes: 5 additions & 0 deletions drivers/net/ethernet/qlogic/qed/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o
qed-$(CONFIG_QED_LL2) += qed_ll2.o
qed-$(CONFIG_QED_OOO) += qed_ooo.o

qed-$(CONFIG_QED_NVMETCP) += \
qed_nvmetcp.o \
qed_nvmetcp_fw_funcs.o \
qed_nvmetcp_ip_services.o

qed-$(CONFIG_QED_RDMA) += \
qed_iwarp.o \
qed_rdma.o \
Expand Down
14 changes: 14 additions & 0 deletions drivers/net/ethernet/qlogic/qed/qed.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ extern const struct qed_common_ops qed_common_ops_pass;
#define QED_MIN_WIDS (4)
#define QED_PF_DEMS_SIZE (4)

#define QED_LLH_DONT_CARE 0

/* cau states */
enum qed_coalescing_mode {
QED_COAL_MODE_DISABLE,
Expand Down Expand Up @@ -200,6 +202,7 @@ enum qed_pci_personality {
QED_PCI_ETH,
QED_PCI_FCOE,
QED_PCI_ISCSI,
QED_PCI_NVMETCP,
QED_PCI_ETH_ROCE,
QED_PCI_ETH_IWARP,
QED_PCI_ETH_RDMA,
Expand Down Expand Up @@ -239,6 +242,7 @@ enum QED_FEATURE {
QED_PF_L2_QUE,
QED_VF,
QED_RDMA_CNQ,
QED_NVMETCP_CQ,
QED_ISCSI_CQ,
QED_FCOE_CQ,
QED_VF_L2_QUE,
Expand Down Expand Up @@ -284,6 +288,8 @@ struct qed_hw_info {
((dev)->hw_info.personality == QED_PCI_FCOE)
#define QED_IS_ISCSI_PERSONALITY(dev) \
((dev)->hw_info.personality == QED_PCI_ISCSI)
#define QED_IS_NVMETCP_PERSONALITY(dev) \
((dev)->hw_info.personality == QED_PCI_NVMETCP)

/* Resource Allocation scheme results */
u32 resc_start[QED_MAX_RESC];
Expand Down Expand Up @@ -592,6 +598,7 @@ struct qed_hwfn {
struct qed_ooo_info *p_ooo_info;
struct qed_rdma_info *p_rdma_info;
struct qed_iscsi_info *p_iscsi_info;
struct qed_nvmetcp_info *p_nvmetcp_info;
struct qed_fcoe_info *p_fcoe_info;
struct qed_pf_params pf_params;

Expand Down Expand Up @@ -828,6 +835,7 @@ struct qed_dev {
struct qed_eth_cb_ops *eth;
struct qed_fcoe_cb_ops *fcoe;
struct qed_iscsi_cb_ops *iscsi;
struct qed_nvmetcp_cb_ops *nvmetcp;
} protocol_ops;
void *ops_cookie;

Expand Down Expand Up @@ -999,4 +1007,10 @@ int qed_mfw_fill_tlv_data(struct qed_hwfn *hwfn,
void qed_hw_info_set_offload_tc(struct qed_hw_info *p_info, u8 tc);

void qed_periodic_db_rec_start(struct qed_hwfn *p_hwfn);

int qed_llh_add_src_tcp_port_filter(struct qed_dev *cdev, u16 src_port);
int qed_llh_add_dst_tcp_port_filter(struct qed_dev *cdev, u16 dest_port);
void qed_llh_remove_src_tcp_port_filter(struct qed_dev *cdev, u16 src_port);
void qed_llh_remove_dst_tcp_port_filter(struct qed_dev *cdev, u16 src_port);
void qed_llh_clear_all_filters(struct qed_dev *cdev);
#endif /* _QED_H */
45 changes: 34 additions & 11 deletions drivers/net/ethernet/qlogic/qed/qed_cxt.c
Original file line number Diff line number Diff line change
Expand Up @@ -94,14 +94,14 @@ struct src_ent {

static bool src_proto(enum protocol_type type)
{
return type == PROTOCOLID_ISCSI ||
return type == PROTOCOLID_TCP_ULP ||
type == PROTOCOLID_FCOE ||
type == PROTOCOLID_IWARP;
}

static bool tm_cid_proto(enum protocol_type type)
{
return type == PROTOCOLID_ISCSI ||
return type == PROTOCOLID_TCP_ULP ||
type == PROTOCOLID_FCOE ||
type == PROTOCOLID_ROCE ||
type == PROTOCOLID_IWARP;
Expand Down Expand Up @@ -2072,7 +2072,6 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)
PROTOCOLID_FCOE,
p_params->num_cons,
0);

qed_cxt_set_proto_tid_count(p_hwfn, PROTOCOLID_FCOE,
QED_CXT_FCOE_TID_SEG, 0,
p_params->num_tasks, true);
Expand All @@ -2090,13 +2089,12 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)

if (p_params->num_cons && p_params->num_tasks) {
qed_cxt_set_proto_cid_count(p_hwfn,
PROTOCOLID_ISCSI,
PROTOCOLID_TCP_ULP,
p_params->num_cons,
0);

qed_cxt_set_proto_tid_count(p_hwfn,
PROTOCOLID_ISCSI,
QED_CXT_ISCSI_TID_SEG,
PROTOCOLID_TCP_ULP,
QED_CXT_TCP_ULP_TID_SEG,
0,
p_params->num_tasks,
true);
Expand All @@ -2106,6 +2104,29 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)
}
break;
}
case QED_PCI_NVMETCP:
{
struct qed_nvmetcp_pf_params *p_params;

p_params = &p_hwfn->pf_params.nvmetcp_pf_params;

if (p_params->num_cons && p_params->num_tasks) {
qed_cxt_set_proto_cid_count(p_hwfn,
PROTOCOLID_TCP_ULP,
p_params->num_cons,
0);
qed_cxt_set_proto_tid_count(p_hwfn,
PROTOCOLID_TCP_ULP,
QED_CXT_TCP_ULP_TID_SEG,
0,
p_params->num_tasks,
true);
} else {
DP_INFO(p_hwfn->cdev,
"NvmeTCP personality used without setting params!\n");
}
break;
}
default:
return -EINVAL;
}
Expand All @@ -2129,8 +2150,9 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
seg = QED_CXT_FCOE_TID_SEG;
break;
case QED_PCI_ISCSI:
proto = PROTOCOLID_ISCSI;
seg = QED_CXT_ISCSI_TID_SEG;
case QED_PCI_NVMETCP:
proto = PROTOCOLID_TCP_ULP;
seg = QED_CXT_TCP_ULP_TID_SEG;
break;
default:
return -EINVAL;
Expand Down Expand Up @@ -2455,8 +2477,9 @@ int qed_cxt_get_task_ctx(struct qed_hwfn *p_hwfn,
seg = QED_CXT_FCOE_TID_SEG;
break;
case QED_PCI_ISCSI:
proto = PROTOCOLID_ISCSI;
seg = QED_CXT_ISCSI_TID_SEG;
case QED_PCI_NVMETCP:
proto = PROTOCOLID_TCP_ULP;
seg = QED_CXT_TCP_ULP_TID_SEG;
break;
default:
return -EINVAL;
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/qlogic/qed/qed_cxt.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ int qed_cxt_get_cid_info(struct qed_hwfn *p_hwfn,
int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,
struct qed_tid_mem *p_info);

#define QED_CXT_ISCSI_TID_SEG PROTOCOLID_ISCSI
#define QED_CXT_TCP_ULP_TID_SEG PROTOCOLID_TCP_ULP
#define QED_CXT_ROCE_TID_SEG PROTOCOLID_ROCE
#define QED_CXT_FCOE_TID_SEG PROTOCOLID_FCOE
enum qed_cxt_elem_type {
Expand Down
Loading

0 comments on commit eda1bc6

Please sign in to comment.