Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kernel/rcu: Print out more information when NMI stall
cf: Re: `rcu: INFO: rcu_sched detected stalls on CPUs/tasks` on AMD EPYC server > Huh. Neither CPU 30 nor CPU 94 responded to the NMI. This usually means > that either NMIs aren't working or that the target CPUs are so deeply > in trouble that they cannot respond to NMIs. One historic reason that > the CPUs could be so deeply in trouble would be if the stack pointer > started referencing unmapped memory, but I have no idea whether that > applies to your particular CPUs. > > For whatever it is worth, the most extreme case of a CPU being in trouble > was once long ago when the CPU simply failstopped, so that it was no > longer executing instructions at all. > > On trick that might (or might not) get you more information is to force > RCU to dump the stack remotely instead of sending NMIs. Here is an > (untested) patch that should do the trick: Even on Ryzen: May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: 2-...0: (1 GPs behind) idle=c42/1/0x4000000000000000 softirq=404950836/404950839 fqs=14409 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: 3-...0: (1 GPs behind) idle=ce2/1/0x4000000000000000 softirq=407773923/407773926 fqs=14409 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: (detected by 5, t=60003 jiffies, g=239480749, q=1374055) May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: Sending NMI from CPU 5 to CPUs 2: May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: Sending NMI from CPU 5 to CPUs 3: May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: rcu_sched kthread starved for 20005 jiffies! g239480749 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cp May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: RCU grace-period kthread stack dump: May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu_sched I 0 11 2 0x80004000 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: Call Trace: May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: ? __schedule+0x223/0x6c0 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: ? __switch_to_asm+0x40/0x70 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: schedule+0x40/0xb0 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: schedule_timeout+0x171/0x300 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: ? __next_timer_interrupt+0xc0/0xc0 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu_gp_kthread+0x6e4/0xf80 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: ? __schedule+0x22b/0x6c0 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: ? call_rcu+0x2f0/0x2f0 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: kthread+0x117/0x130 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: ? kthread_create_worker_on_cpu+0x70/0x70 May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: ret_from_fork+0x22/0x40 Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com>
- Loading branch information