Skip to content

Commit

Permalink
x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
Browse files Browse the repository at this point in the history
Dave Jones reports that offlining a CPU leads to this trace:

numa_remove_cpu cpu 1 node 0: mask now 0,2-3
smpboot: CPU 1 is now offline
BUG: using smp_processor_id() in preemptible [00000000] code:
cpu-offline.sh/10591
caller is cmci_rediscover+0x6a/0xe0
Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
Call Trace:
 [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
 [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
 [<ffffffff815ef082>] _cpu_down+0x302/0x350
 [<ffffffff815ef106>] cpu_down+0x36/0x50
 [<ffffffff815f1c9d>] store_online+0x8d/0xd0
 [<ffffffff813edc48>] dev_attr_store+0x18/0x30
 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
 [<ffffffff811adfb2>] vfs_write+0xa2/0x170
 [<ffffffff811ae16c>] sys_write+0x4c/0xa0
 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b

However, a look at cmci_rediscover shows that it can be simplified quite
a bit, apart from solving the above issue. It invokes functions that
take spin locks with interrupts disabled, and hence it can run in atomic
context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
is already dead and out of the cpu_online_mask. So take these points into
account and simplify the code, and thereby also fix the above issue.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
  • Loading branch information
Srivatsa S. Bhat authored and Tony Luck committed Apr 2, 2013
1 parent 07961ac commit 7a0c819
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 23 deletions.
4 changes: 2 additions & 2 deletions arch/x86/include/asm/mce.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,13 +146,13 @@ DECLARE_PER_CPU(struct device *, mce_device);
void mce_intel_feature_init(struct cpuinfo_x86 *c);
void cmci_clear(void);
void cmci_reenable(void);
void cmci_rediscover(int dying);
void cmci_rediscover(void);
void cmci_recheck(void);
#else
static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
static inline void cmci_clear(void) {}
static inline void cmci_reenable(void) {}
static inline void cmci_rediscover(int dying) {}
static inline void cmci_rediscover(void) {}
static inline void cmci_recheck(void) {}
#endif

Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kernel/cpu/mcheck/mce.c
Original file line number Diff line number Diff line change
Expand Up @@ -2358,7 +2358,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)

if (action == CPU_POST_DEAD) {
/* intentionally ignoring frozen here */
cmci_rediscover(cpu);
cmci_rediscover();
}

return NOTIFY_OK;
Expand Down
25 changes: 5 additions & 20 deletions arch/x86/kernel/cpu/mcheck/mce_intel.c
Original file line number Diff line number Diff line change
Expand Up @@ -285,39 +285,24 @@ void cmci_clear(void)
raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
}

static long cmci_rediscover_work_func(void *arg)
static void cmci_rediscover_work_func(void *arg)
{
int banks;

/* Recheck banks in case CPUs don't all have the same */
if (cmci_supported(&banks))
cmci_discover(banks);

return 0;
}

/*
* After a CPU went down cycle through all the others and rediscover
* Must run in process context.
*/
void cmci_rediscover(int dying)
/* After a CPU went down cycle through all the others and rediscover */
void cmci_rediscover(void)
{
int cpu, banks;
int banks;

if (!cmci_supported(&banks))
return;

for_each_online_cpu(cpu) {
if (cpu == dying)
continue;

if (cpu == smp_processor_id()) {
cmci_rediscover_work_func(NULL);
continue;
}

work_on_cpu(cpu, cmci_rediscover_work_func, NULL);
}
on_each_cpu(cmci_rediscover_work_func, NULL, 1);
}

/*
Expand Down

0 comments on commit 7a0c819

Please sign in to comment.