Skip to content

Commit

Permalink
Merge branch 'add-devlink-and-devlink-health-reporters-to'
Browse files Browse the repository at this point in the history
George Cherian says:

====================
Add devlink and devlink health reporters to octeontx2

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA block.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

For now, I have dropped the NIX block health reporters.
This series attempts to add health reporters only for the NPA block.
As per Jakub's suggestion separate reporters per event is used and also
got rid of the counters.
====================

Link: https://lore.kernel.org/r/20201211062526.2302643-1-george.cherian@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Jakub Kicinski committed Dec 15, 2020
2 parents 0e12c02 + 80b9414 commit 8718d60
Show file tree
Hide file tree
Showing 8 changed files with 912 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Contents
- `Overview`_
- `Drivers`_
- `Basic packet flow`_
- `Devlink health reporters`_

Overview
========
Expand Down Expand Up @@ -157,3 +158,52 @@ Egress
3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF.
4. NIX block transmits the pkt on the designated channel.
5. NPC MCAM entries can be installed to divert pkt onto a different channel.

Devlink health reporters
========================

NPA Reporters
-------------
The NPA reporters are responsible for reporting and recovering the following group of errors
1. GENERAL events
- Error due to operation of unmapped PF.
- Error due to disabled alloc/free for other HW blocks (NIX, SSO, TIM, DPI and AURA).
2. ERROR events
- Fault due to NPA_AQ_INST_S read or NPA_AQ_RES_S write.
- AQ Doorbell Error.
3. RAS events
- RAS Error Reporting for NPA_AQ_INST_S/NPA_AQ_RES_S.
4. RVU events
- Error due to unmapped slot.

Sample Output
-------------
~# devlink health
pci/0002:01:00.0:
reporter hw_npa_intr
state healthy error 2872 recover 2872 last_dump_date 2020-12-10 last_dump_time 09:39:09 grace_period 0 auto_recover true auto_dump true
reporter hw_npa_gen
state healthy error 2872 recover 2872 last_dump_date 2020-12-11 last_dump_time 04:43:04 grace_period 0 auto_recover true auto_dump true
reporter hw_npa_err
state healthy error 2871 recover 2871 last_dump_date 2020-12-10 last_dump_time 09:39:17 grace_period 0 auto_recover true auto_dump true
reporter hw_npa_ras
state healthy error 0 recover 0 last_dump_date 2020-12-10 last_dump_time 09:32:40 grace_period 0 auto_recover true auto_dump true

Each reporter dumps the
- Error Type
- Error Register value
- Reason in words

For eg:
~# devlink health dump show pci/0002:01:00.0 reporter hw_npa_gen
NPA_AF_GENERAL:
NPA General Interrupt Reg : 1
NIX0: free disabled RX
~# devlink health dump show pci/0002:01:00.0 reporter hw_npa_intr
NPA_AF_RVU:
NPA RVU Interrupt Reg : 1
Unmap Slot Error
~# devlink health dump show pci/0002:01:00.0 reporter hw_npa_err
NPA_AF_ERR:
NPA Error Interrupt Reg : 4096
AQ Doorbell Error
1 change: 1 addition & 0 deletions drivers/net/ethernet/marvell/octeontx2/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ config OCTEONTX2_MBOX
config OCTEONTX2_AF
tristate "Marvell OcteonTX2 RVU Admin Function driver"
select OCTEONTX2_MBOX
select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
depends on PCI
help
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/marvell/octeontx2/af/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ obj-$(CONFIG_OCTEONTX2_AF) += octeontx2_af.o
octeontx2_mbox-y := mbox.o rvu_trace.o
octeontx2_af-y := cgx.o rvu.o rvu_cgx.o rvu_npa.o rvu_nix.o \
rvu_reg.o rvu_npc.o rvu_debugfs.o ptp.o rvu_npc_fs.o \
rvu_cpt.o
rvu_cpt.o rvu_devlink.o
9 changes: 8 additions & 1 deletion drivers/net/ethernet/marvell/octeontx2/af/rvu.c
Original file line number Diff line number Diff line change
Expand Up @@ -2826,17 +2826,23 @@ static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (err)
goto err_flr;

err = rvu_register_dl(rvu);
if (err)
goto err_irq;

rvu_setup_rvum_blk_revid(rvu);

/* Enable AF's VFs (if any) */
err = rvu_enable_sriov(rvu);
if (err)
goto err_irq;
goto err_dl;

/* Initialize debugfs */
rvu_dbg_init(rvu);

return 0;
err_dl:
rvu_unregister_dl(rvu);
err_irq:
rvu_unregister_interrupts(rvu);
err_flr:
Expand Down Expand Up @@ -2868,6 +2874,7 @@ static void rvu_remove(struct pci_dev *pdev)

rvu_dbg_exit(rvu);
rvu_unregister_interrupts(rvu);
rvu_unregister_dl(rvu);
rvu_flr_wq_destroy(rvu);
rvu_cgx_exit(rvu);
rvu_fwdata_exit(rvu);
Expand Down
4 changes: 4 additions & 0 deletions drivers/net/ethernet/marvell/octeontx2/af/rvu.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
#define RVU_H

#include <linux/pci.h>
#include <net/devlink.h>

#include "rvu_struct.h"
#include "rvu_devlink.h"
#include "common.h"
#include "mbox.h"
#include "npc.h"
Expand Down Expand Up @@ -422,6 +425,7 @@ struct rvu {
#ifdef CONFIG_DEBUG_FS
struct rvu_debugfs rvu_dbg;
#endif
struct rvu_devlink *rvu_dl;
};

static inline void rvu_write64(struct rvu *rvu, u64 block, u64 offset, u64 val)
Expand Down
Loading

0 comments on commit 8718d60

Please sign in to comment.