Skip to content

Commit

Permalink
[IA64] Proper handling of TLB errors from duplicate itr.d dropins
Browse files Browse the repository at this point in the history
Jack Steiner noticed that duplicate TLB DTC entries do not cause a
linux panic.  See discussion:

http://www.gelato.unsw.edu.au/archives/linux-ia64/0307/6108.html

The current TLB recovery code is recovering from the duplicate itr.d
dropins, masking the underlying problem.  This change modifies
the MCA recovery code to look for the TLB check signature of the
duplicate TLB entry and panic in that case.

Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
  • Loading branch information
Russ Anderson authored and Tony Luck committed Mar 8, 2007
1 parent 908e0a8 commit 618b206
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 6 deletions.
8 changes: 2 additions & 6 deletions arch/ia64/kernel/mca.c
Original file line number Diff line number Diff line change
Expand Up @@ -1192,8 +1192,6 @@ void
ia64_mca_handler(struct pt_regs *regs, struct switch_stack *sw,
struct ia64_sal_os_state *sos)
{
pal_processor_state_info_t *psp = (pal_processor_state_info_t *)
&sos->proc_state_param;
int recover, cpu = smp_processor_id();
struct task_struct *previous_current;
struct ia64_mca_notify_die nd =
Expand Down Expand Up @@ -1223,10 +1221,8 @@ ia64_mca_handler(struct pt_regs *regs, struct switch_stack *sw,
/* Get the MCA error record and log it */
ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);

/* TLB error is only exist in this SAL error record */
recover = (psp->tc && !(psp->cc || psp->bc || psp->rc || psp->uc))
/* other error recovery */
|| (ia64_mca_ucmc_extension
/* MCA error recovery */
recover = (ia64_mca_ucmc_extension
&& ia64_mca_ucmc_extension(
IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA),
sos));
Expand Down
33 changes: 33 additions & 0 deletions arch/ia64/kernel/mca_drv.c
Original file line number Diff line number Diff line change
Expand Up @@ -607,6 +607,33 @@ recover_from_platform_error(slidx_table_t *slidx, peidx_table_t *peidx,
return status;
}

/*
* recover_from_tlb_check
* @peidx: pointer of index of processor error section
*
* Return value:
* 1 on Success / 0 on Failure
*/
static int
recover_from_tlb_check(peidx_table_t *peidx)
{
sal_log_mod_error_info_t *smei;
pal_tlb_check_info_t *ptci;

smei = (sal_log_mod_error_info_t *)peidx_tlb_check(peidx, 0);
ptci = (pal_tlb_check_info_t *)&(smei->check_info);

/*
* Look for signature of a duplicate TLB DTC entry, which is
* a SW bug and always fatal.
*/
if (ptci->op == PAL_TLB_CHECK_OP_PURGE
&& !(ptci->itr || ptci->dtc || ptci->itc))
return fatal_mca("Duplicate TLB entry");

return mca_recovered("TLB check recovered");
}

/**
* recover_from_processor_error
* @platform: whether there are some platform error section or not
Expand Down Expand Up @@ -651,6 +678,12 @@ recover_from_processor_error(int platform, slidx_table_t *slidx,
if (psp->us || psp->ci == 0)
return fatal_mca("error not contained");

/*
* Look for recoverable TLB check
*/
if (psp->tc && !(psp->cc || psp->bc || psp->rc || psp->uc))
return recover_from_tlb_check(peidx);

/*
* The cache check and bus check bits have four possible states
* cc bc
Expand Down
1 change: 1 addition & 0 deletions include/asm-ia64/pal.h
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,7 @@ typedef u64 pal_mc_info_index_t;
* dependent
*/

#define PAL_TLB_CHECK_OP_PURGE 8

typedef struct pal_process_state_info_s {
u64 reserved1 : 2,
Expand Down

0 comments on commit 618b206

Please sign in to comment.