Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

From: Jiri Kosina
Date: Wed May 13 2015 - 10:26:15 EST


On Wed, 13 May 2015, 王龙 wrote:

> Hi all,
>
> In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86,
> it will trigger an NMI on each CPU and call show_regs(). But this can lead
> to a hard lock up if the NMI comes in on another printk().
>
> The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe
> NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI
> triggers, it switches the printk routine for that CPU to call a NMI safe printk
> function that records the printk in a per_cpu seq_buf descriptor. After all
> NMIs have finished recording its data, the seq_bufs are printed in a safe
> context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)?
> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
>
> Could anyone give me some ideas?

Either you backport seq_buf-based aproach to the older kernel, or, if you
are working on 3.4 kernel or earlier (basically any kernel preceeding the
printk() revamp that happened in 7ff9554bb57 and after), you can use
slightly simpler aproach.

It's an aproach we used initially when finding out the issue for the first
time, and it is proven to work as well (but it's not applicable after Kay
added all the complexity to printk()).

You can see it in our SLE11 kernel tree, available on

http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

for example.

It's up to you to judget which is the least painful way :)

--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/