Skip to content

Commit

Permalink
x86/MCE/AMD: Clear DFR errors found in THR handler
Browse files Browse the repository at this point in the history
AMD's MCA Thresholding feature counts errors of all severity levels, not
just correctable errors. If a deferred error causes the threshold limit
to be reached (it was the error that caused the overflow), then both a
deferred error interrupt and a thresholding interrupt will be triggered.

The order of the interrupts is not guaranteed. If the threshold
interrupt handler is executed first, then it will clear MCA_STATUS for
the error. It will not check or clear MCA_DESTAT which also holds a copy
of the deferred error. When the deferred error interrupt handler runs it
will not find an error in MCA_STATUS, but it will find the error in
MCA_DESTAT. This will cause two errors to be logged.

Check for deferred errors when handling a threshold interrupt. If a bank
contains a deferred error, then clear the bank's MCA_DESTAT register.

Define a new helper function to do the deferred error check and clearing
of MCA_DESTAT.

  [ bp: Simplify, convert comment to passive voice. ]

Fixes: 37d43ac ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220621155943.33623-1-yazen.ghannam@amd.com
  • Loading branch information
Yazen Ghannam authored and Borislav Petkov committed Oct 27, 2022
1 parent 9abf231 commit bc1b705
Showing 1 changed file with 20 additions and 13 deletions.
33 changes: 20 additions & 13 deletions arch/x86/kernel/cpu/mce/amd.c
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,24 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
return status & MCI_STATUS_DEFERRED;
}

static bool _log_error_deferred(unsigned int bank, u32 misc)
{
if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
mca_msr_reg(bank, MCA_ADDR), misc))
return false;

/*
* Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers.
* Return true here to avoid accessing these registers.
*/
if (!mce_flags.smca)
return true;

/* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */
wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
return true;
}

/*
* We have three scenarios for checking for Deferred errors:
*
Expand All @@ -799,19 +817,8 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
*/
static void log_error_deferred(unsigned int bank)
{
bool defrd;

defrd = _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
mca_msr_reg(bank, MCA_ADDR), 0);

if (!mce_flags.smca)
return;

/* Clear MCA_DESTAT if we logged the deferred error from MCA_STATUS. */
if (defrd) {
wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
if (_log_error_deferred(bank, 0))
return;
}

/*
* Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check
Expand All @@ -832,7 +839,7 @@ static void amd_deferred_error_interrupt(void)

static void log_error_thresholding(unsigned int bank, u64 misc)
{
_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), mca_msr_reg(bank, MCA_ADDR), misc);
_log_error_deferred(bank, misc);
}

static void log_and_reset_block(struct threshold_block *block)
Expand Down

0 comments on commit bc1b705

Please sign in to comment.