Skip to content

Commit

Permalink
crypto: qat - add support for device telemetry
Browse files Browse the repository at this point in the history
Expose through debugfs device telemetry data for QAT GEN4 devices.

This allows to gather metrics about the performance and the utilization
of a device. In particular, statistics on (1) the utilization of the
PCIe channel, (2) address translation, when SVA is enabled and (3) the
internal engines for crypto and data compression.

If telemetry is supported by the firmware, the driver allocates a DMA
region and a circular buffer. When telemetry is enabled, through the
`control` attribute in debugfs, the driver sends to the firmware, via
the admin interface, the `TL_START` command. This triggers the device to
periodically gather telemetry data from hardware registers and write it
into the DMA memory region. The device writes into the shared region
every second.

The driver, every 500ms, snapshots the DMA shared region into the
circular buffer. This is then used to compute basic metric
(min/max/average) on each counter, every time the `device_data` attribute
is queried.

Telemetry counters are exposed through debugfs in the folder
/sys/kernel/debug/qat_<device>_<BDF>/telemetry.

For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.

This patch is based on earlier work done by Wojciech Ziemba.

Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Damian Muszynski <damian.muszynski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  • Loading branch information
Lucas Segarra Fernandez authored and Herbert Xu committed Dec 29, 2023
1 parent 7f06679 commit 69e7649
Show file tree
Hide file tree
Showing 13 changed files with 1,339 additions and 0 deletions.
103 changes: 103 additions & 0 deletions Documentation/ABI/testing/debugfs-driver-qat_telemetry
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/control
Date: March 2024
KernelVersion: 6.8
Contact: qat-linux@intel.com
Description: (RW) Enables/disables the reporting of telemetry metrics.

Allowed values to write:
========================
* 0: disable telemetry
* 1: enable telemetry
* 2, 3, 4: enable telemetry and calculate minimum, maximum
and average for each counter over 2, 3 or 4 samples

Returned values:
================
* 1-4: telemetry is enabled and running
* 0: telemetry is disabled

Example.

Writing '3' to this file starts the collection of
telemetry metrics. Samples are collected every second and
stored in a circular buffer of size 3. These values are then
used to calculate the minimum, maximum and average for each
counter. After enabling, counters can be retrieved through
the ``device_data`` file::

echo 3 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control

Writing '0' to this file stops the collection of telemetry
metrics::

echo 0 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control

This attribute is only available for qat_4xxx devices.

What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/device_data
Date: March 2024
KernelVersion: 6.8
Contact: qat-linux@intel.com
Description: (RO) Reports device telemetry counters.
Reads report metrics about performance and utilization of
a QAT device:

======================= ========================================
Field Description
======================= ========================================
sample_cnt number of acquisitions of telemetry data
from the device. Reads are performed
every 1000 ms.
pci_trans_cnt number of PCIe partial transactions
max_rd_lat maximum logged read latency [ns] (could
be any read operation)
rd_lat_acc_avg average read latency [ns]
max_gp_lat max get to put latency [ns] (only takes
samples for AE0)
gp_lat_acc_avg average get to put latency [ns]
bw_in PCIe, write bandwidth [Mbps]
bw_out PCIe, read bandwidth [Mbps]
at_page_req_lat_avg Address Translator(AT), average page
request latency [ns]
at_trans_lat_avg AT, average page translation latency [ns]
at_max_tlb_used AT, maximum uTLB used
util_cpr<N> utilization of Compression slice N [%]
exec_cpr<N> execution count of Compression slice N
util_xlt<N> utilization of Translator slice N [%]
exec_xlt<N> execution count of Translator slice N
util_dcpr<N> utilization of Decompression slice N [%]
exec_dcpr<N> execution count of Decompression slice N
util_pke<N> utilization of PKE N [%]
exec_pke<N> execution count of PKE N
util_ucs<N> utilization of UCS slice N [%]
exec_ucs<N> execution count of UCS slice N
util_wat<N> utilization of Wireless Authentication
slice N [%]
exec_wat<N> execution count of Wireless Authentication
slice N
util_wcp<N> utilization of Wireless Cipher slice N [%]
exec_wcp<N> execution count of Wireless Cipher slice N
util_cph<N> utilization of Cipher slice N [%]
exec_cph<N> execution count of Cipher slice N
util_ath<N> utilization of Authentication slice N [%]
exec_ath<N> execution count of Authentication slice N
======================= ========================================

The telemetry report file can be read with the following command::

cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/device_data

If ``control`` is set to 1, only the current values of the
counters are displayed::

<counter_name> <current>

If ``control`` is 2, 3 or 4, counters are displayed in the
following format::

<counter_name> <current> <min> <max> <avg>

If a device lacks of a specific accelerator, the corresponding
attribute is not reported.

This attribute is only available for qat_4xxx devices.
2 changes: 2 additions & 0 deletions drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include <adf_gen4_pm.h>
#include <adf_gen4_ras.h>
#include <adf_gen4_timer.h>
#include <adf_gen4_tl.h>
#include "adf_420xx_hw_data.h"
#include "icp_qat_hw.h"

Expand Down Expand Up @@ -543,6 +544,7 @@ void adf_init_hw_data_420xx(struct adf_hw_device_data *hw_data, u32 dev_id)
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen4_init_dc_ops(&hw_data->dc_ops);
adf_gen4_init_ras_ops(&hw_data->ras_ops);
adf_gen4_init_tl_data(&hw_data->tl_data);
adf_init_rl_data(&hw_data->rl_data);
}

Expand Down
2 changes: 2 additions & 0 deletions drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include <adf_gen4_pm.h>
#include "adf_gen4_ras.h"
#include <adf_gen4_timer.h>
#include <adf_gen4_tl.h>
#include "adf_4xxx_hw_data.h"
#include "icp_qat_hw.h"

Expand Down Expand Up @@ -453,6 +454,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data, u32 dev_id)
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen4_init_dc_ops(&hw_data->dc_ops);
adf_gen4_init_ras_ops(&hw_data->ras_ops);
adf_gen4_init_tl_data(&hw_data->tl_data);
adf_init_rl_data(&hw_data->rl_data);
}

Expand Down
3 changes: 3 additions & 0 deletions drivers/crypto/intel/qat/qat_common/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,12 @@ intel_qat-$(CONFIG_DEBUG_FS) += adf_transport_debug.o \
adf_fw_counters.o \
adf_cnv_dbgfs.o \
adf_gen4_pm_debugfs.o \
adf_gen4_tl.o \
adf_heartbeat.o \
adf_heartbeat_dbgfs.o \
adf_pm_dbgfs.o \
adf_telemetry.o \
adf_tl_debugfs.o \
adf_dbgfs.o

intel_qat-$(CONFIG_PCI_IOV) += adf_sriov.o adf_vf_isr.o adf_pfvf_utils.o \
Expand Down
4 changes: 4 additions & 0 deletions drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <linux/types.h>
#include "adf_cfg_common.h"
#include "adf_rl.h"
#include "adf_telemetry.h"
#include "adf_pfvf_msg.h"

#define ADF_DH895XCC_DEVICE_NAME "dh895xcc"
Expand Down Expand Up @@ -254,6 +255,7 @@ struct adf_hw_device_data {
struct adf_ras_ops ras_ops;
struct adf_dev_err_mask dev_err_mask;
struct adf_rl_hw_data rl_data;
struct adf_tl_hw_data tl_data;
const char *fw_name;
const char *fw_mmp_name;
u32 fuses;
Expand Down Expand Up @@ -308,6 +310,7 @@ struct adf_hw_device_data {
#define GET_CSR_OPS(accel_dev) (&(accel_dev)->hw_device->csr_ops)
#define GET_PFVF_OPS(accel_dev) (&(accel_dev)->hw_device->pfvf_ops)
#define GET_DC_OPS(accel_dev) (&(accel_dev)->hw_device->dc_ops)
#define GET_TL_DATA(accel_dev) GET_HW_DATA(accel_dev)->tl_data
#define accel_to_pci_dev(accel_ptr) accel_ptr->accel_pci_dev.pci_dev

struct adf_admin_comms;
Expand Down Expand Up @@ -356,6 +359,7 @@ struct adf_accel_dev {
struct adf_cfg_device_data *cfg;
struct adf_fw_loader_data *fw_loader;
struct adf_admin_comms *admin;
struct adf_telemetry *telemetry;
struct adf_dc_data *dc_data;
struct adf_pm power_management;
struct list_head crypto_list;
Expand Down
3 changes: 3 additions & 0 deletions drivers/crypto/intel/qat/qat_common/adf_dbgfs.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "adf_fw_counters.h"
#include "adf_heartbeat_dbgfs.h"
#include "adf_pm_dbgfs.h"
#include "adf_tl_debugfs.h"

/**
* adf_dbgfs_init() - add persistent debugfs entries
Expand Down Expand Up @@ -66,6 +67,7 @@ void adf_dbgfs_add(struct adf_accel_dev *accel_dev)
adf_heartbeat_dbgfs_add(accel_dev);
adf_pm_dbgfs_add(accel_dev);
adf_cnv_dbgfs_add(accel_dev);
adf_tl_dbgfs_add(accel_dev);
}
}

Expand All @@ -79,6 +81,7 @@ void adf_dbgfs_rm(struct adf_accel_dev *accel_dev)
return;

if (!accel_dev->is_vf) {
adf_tl_dbgfs_rm(accel_dev);
adf_cnv_dbgfs_rm(accel_dev);
adf_pm_dbgfs_rm(accel_dev);
adf_heartbeat_dbgfs_rm(accel_dev);
Expand Down
118 changes: 118 additions & 0 deletions drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (c) 2023 Intel Corporation. */
#include <linux/export.h>
#include <linux/kernel.h>

#include "adf_gen4_tl.h"
#include "adf_telemetry.h"
#include "adf_tl_debugfs.h"

#define ADF_GEN4_TL_DEV_REG_OFF(reg) ADF_TL_DEV_REG_OFF(reg, gen4)

#define ADF_GEN4_TL_SL_UTIL_COUNTER(_name) \
ADF_TL_COUNTER("util_" #_name, \
ADF_TL_SIMPLE_COUNT, \
ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_util, gen4))

#define ADF_GEN4_TL_SL_EXEC_COUNTER(_name) \
ADF_TL_COUNTER("exec_" #_name, \
ADF_TL_SIMPLE_COUNT, \
ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_exec_cnt, gen4))

/* Device level counters. */
static const struct adf_tl_dbg_counter dev_counters[] = {
/* PCIe partial transactions. */
ADF_TL_COUNTER(PCI_TRANS_CNT_NAME, ADF_TL_SIMPLE_COUNT,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_pci_trans_cnt)),
/* Max read latency[ns]. */
ADF_TL_COUNTER(MAX_RD_LAT_NAME, ADF_TL_COUNTER_NS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_max)),
/* Read latency average[ns]. */
ADF_TL_COUNTER_LATENCY(RD_LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_cmpl_cnt)),
/* Max get to put latency[ns]. */
ADF_TL_COUNTER(MAX_LAT_NAME, ADF_TL_COUNTER_NS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_max)),
/* Get to put latency average[ns]. */
ADF_TL_COUNTER_LATENCY(LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_ae_put_cnt)),
/* PCIe write bandwidth[Mbps]. */
ADF_TL_COUNTER(BW_IN_NAME, ADF_TL_COUNTER_MBPS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_in)),
/* PCIe read bandwidth[Mbps]. */
ADF_TL_COUNTER(BW_OUT_NAME, ADF_TL_COUNTER_MBPS,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_out)),
/* Page request latency average[ns]. */
ADF_TL_COUNTER_LATENCY(PAGE_REQ_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_cnt)),
/* Page translation latency average[ns]. */
ADF_TL_COUNTER_LATENCY(AT_TRANS_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_acc),
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_cnt)),
/* Maximum uTLB used. */
ADF_TL_COUNTER(AT_MAX_UTLB_USED_NAME, ADF_TL_SIMPLE_COUNT,
ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_max_tlb_used)),
};

/* Slice utilization counters. */
static const struct adf_tl_dbg_counter sl_util_counters[ADF_TL_SL_CNT_COUNT] = {
/* Compression slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(cpr),
/* Translator slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(xlt),
/* Decompression slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(dcpr),
/* PKE utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(pke),
/* Wireless Authentication slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(wat),
/* Wireless Cipher slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(wcp),
/* UCS slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(ucs),
/* Cipher slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(cph),
/* Authentication slice utilization. */
ADF_GEN4_TL_SL_UTIL_COUNTER(ath),
};

/* Slice execution counters. */
static const struct adf_tl_dbg_counter sl_exec_counters[ADF_TL_SL_CNT_COUNT] = {
/* Compression slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(cpr),
/* Translator slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(xlt),
/* Decompression slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(dcpr),
/* PKE execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(pke),
/* Wireless Authentication slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(wat),
/* Wireless Cipher slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(wcp),
/* UCS slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(ucs),
/* Cipher slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(cph),
/* Authentication slice execution count. */
ADF_GEN4_TL_SL_EXEC_COUNTER(ath),
};

void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
{
tl_data->layout_sz = ADF_GEN4_TL_LAYOUT_SZ;
tl_data->slice_reg_sz = ADF_GEN4_TL_SLICE_REG_SZ;
tl_data->num_hbuff = ADF_GEN4_TL_NUM_HIST_BUFFS;
tl_data->msg_cnt_off = ADF_GEN4_TL_MSG_CNT_OFF;
tl_data->cpp_ns_per_cycle = ADF_GEN4_CPP_NS_PER_CYCLE;
tl_data->bw_units_to_bytes = ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES;

tl_data->dev_counters = dev_counters;
tl_data->num_dev_counters = ARRAY_SIZE(dev_counters);
tl_data->sl_util_counters = sl_util_counters;
tl_data->sl_exec_counters = sl_exec_counters;
}
EXPORT_SYMBOL_GPL(adf_gen4_init_tl_data);
Loading

0 comments on commit 69e7649

Please sign in to comment.