Re: Query about kdump_msg hook into crash_kexec()

From: Eric W. Biederman
Date: Tue Feb 08 2011 - 12:35:26 EST


Vivek Goyal <vgoyal@xxxxxxxxxx> writes:

> On Thu, Feb 03, 2011 at 05:08:01PM -0500, Seiji Aguchi wrote:
>> Hi Eric,
>>
>> Thank you for your prompt reply.
>>
>> I would like to consider "Needs in enterprise area" and "Implementation of kmsg_dump()" separately.
>>
>> (1) Needs in enterprise area
>> In case of kdump failure, we would like to store kernel buffer to NVRAM/flush memory
>> for detecting root cause of kernel crash.
>>
>> (2) Implementation of kmsg_dump
>> You suggest to review/test cording of kmsg_dump() more.
>>
>> What do you think about (1)?
>> Is it acceptable for you?
>
> Ok, I am just trying to think loud about this problem and see if something
> fruitful comes out which paves the way forward.
>
> - So ideally we would like kdump_msg() to be called after crash_kexec() so
> that any unaudited (third party modules), unreliable calls do not
> compromise the realiability of kdump operation.
>
> But hitachi folks seems to be wanting to save atleast kernel buffers
> somwhere in the NVRAM etc because they think that kdump can be
> unreliable and we might not capture any information after the crash. So
> they kind of want two mechanisms in place. One is light weight which
> tries to save kernel buffers in NVRAM and then one heavy weight one
> which tries to save the entire/filtered kernel core.
>
> Personally I am not too excited about the idea but I guess I can live
> with it. We can try to audit atleast in kernel module and for external
> modules we don't have much control and live with the fact that if
> modules screw up, we don't capture the dump.
>
> Those who don't want this behavior can do three things.
>
> - Disable kdump_msg() at compile time.
> - Do not load any module which registers for kdump_msg()
> - Implement a /proc tunable which allows controlling this
> behavior.
>
> - Ok, having said why do we want it, comes the question of how to
> do it so that it works reasonably well.
>
> - There seems to be on common requirement of kmsg_dump() and kdump()
> and that is stop other cpus reliably (use nmi if possible). Can
> we try to share this code between kmsg_dump and crash_kexec(). So
> something like as follows.
>
> - panic happens
> - Do all the activities related to printing panic string and
> stack dump.
> - Stop other cpus.
> - This can be probably be done with the equivalent of
> machine_crash_shutdown() function. In fact this function
> can probably be broken down in two parts. First part
> does shutdown_prepare() where all other cpus are shot
> down and second part can do the actual disabling of
> LAPIC/IOAPIC and saving cpu registers etc.
>
> if (mutex_trylock(some_shutdown_mutex)) {
> /* setp regs, fix vmcoreinfo etc */
> crash_kexec_prepare();
> machine_shutdown_prepare();
> kdump_msg();
> crash_kexec_execute()
> /* Also call panic_notifier_list here ? */
> }
>
> crash_kexec_prepare () {
> crash_setup_regs(&fixed_regs, regs);
> crash_save_vmcoreinfo();
> }
>
> crash_kexec_execute() {
> /* Shutdown lapic/ioapic, save this cpu register etc */
> machine_shutdown();
> machine_kexec()
> }
>
> So basically we break down machine_shutdown() function in two parts
> and start sharing common part between kdump_msg(), crash_kexec and
> possibly panic_notifiers.
>
> If kdump is not configured, then after executing kdump_msg() and panic
> notifiers, we should either be sitting in tight loop with interrupt
> enabled for somebody to press Ctrl-boot or reboot system upon lapse
> of panic_timeout().
>
> Eric, does it make sense to you?

kexec on panic doesn't strictly require that we stop other cpus.

What makes sense to me at this point is for someone on the kmsg_dump
side to make a strong case that the code actually works in a crash dump
scenario. We have lots of experience over the years that says a design
like kmsg_dump is attractive but turns out to be a unreliable piece of
junk that fails just when you need it. Because developers only test
the case when the kernel is happy and because people share code with
the regular path drivers, and that code assumes things are happy.

I forget exactly why but last I looked.
local_irq_disable()
kmsg_dump()
local_irq_enalbe()

Was a recipe for disaster, and you have be at least that good to even
have a chance of working in a crash dump scenario.

In part I am puzzled why the kmsg dump doesn't just use the printk
interface. Strangely enough printk works in the event of a crash and
has been shown to be reliable over the years.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/