Skip to content

Commit

Permalink
kernel/rcu: Print out more information when NMI stall
Browse files Browse the repository at this point in the history
cf: Re: `rcu: INFO: rcu_sched detected stalls on CPUs/tasks` on AMD EPYC server

> Huh.  Neither CPU 30 nor CPU 94 responded to the NMI.  This usually means
> that either NMIs aren't working or that the target CPUs are so deeply
> in trouble that they cannot respond to NMIs.  One historic reason that
> the CPUs could be so deeply in trouble would be if the stack pointer
> started referencing unmapped memory, but I have no idea whether that
> applies to your particular CPUs.
>
> For whatever it is worth, the most extreme case of a CPU being in trouble
> was once long ago when the CPU simply failstopped, so that it was no
> longer executing instructions at all.
>
> On trick that might (or might not) get you more information is to force
> RCU to dump the stack remotely instead of sending NMIs.  Here is an
> (untested) patch that should do the trick:

Even on Ryzen:

    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu:    2-...0: (1 GPs behind) idle=c42/1/0x4000000000000000 softirq=404950836/404950839 fqs=14409
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu:    3-...0: (1 GPs behind) idle=ce2/1/0x4000000000000000 softirq=407773923/407773926 fqs=14409
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:         (detected by 5, t=60003 jiffies, g=239480749, q=1374055)
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: Sending NMI from CPU 5 to CPUs 2:
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: Sending NMI from CPU 5 to CPUs 3:
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: rcu_sched kthread starved for 20005 jiffies! g239480749 f0x0 RCU_GP_DOING_FQS(6) ->state=0x0 ->cp
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu: RCU grace-period kthread stack dump:
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: rcu_sched       I    0    11      2 0x80004000
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel: Call Trace:
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  ? __schedule+0x223/0x6c0
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  ? __switch_to_asm+0x40/0x70
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  schedule+0x40/0xb0
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  schedule_timeout+0x171/0x300
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  ? __next_timer_interrupt+0xc0/0xc0
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  rcu_gp_kthread+0x6e4/0xf80
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  ? __schedule+0x22b/0x6c0
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  ? call_rcu+0x2f0/0x2f0
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  kthread+0x117/0x130
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
    May 11 20:20:27 hypnotoad.molgen.mpg.de kernel:  ret_from_fork+0x22/0x40

Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com>
  • Loading branch information
pmenzel authored and donald committed Feb 25, 2022
1 parent 1a28f2c commit 358ac01
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions kernel/rcu/tree_stall.h
Original file line number Diff line number Diff line change
Expand Up @@ -334,8 +334,7 @@ static void rcu_dump_cpu_stacks(void)
raw_spin_lock_irqsave_rcu_node(rnp, flags);
for_each_leaf_node_possible_cpu(rnp, cpu)
if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
if (!trigger_single_cpu_backtrace(cpu))
dump_cpu_task(cpu);
dump_cpu_task(cpu);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
}
Expand Down

0 comments on commit 358ac01

Please sign in to comment.