Re: Query about kdump_msg hook into crash_kexec()

From: Eric W. Biederman
Date: Thu Feb 03 2011 - 16:13:21 EST


Seiji Aguchi <seiji.aguchi@xxxxxxx> writes:

> Hi,
>
>>PS: FWIW, Hitach folks have usage idea for their enterprise purpose, but
>> unfortunately I don't know its detail. I hope anyone tell us it.
>
> I explain the usage of kmsg_dump(KMSG_DUMP_KEXEC) in enterprise area.
>
> [Background]
> In our support service experience, we always need to detect root cause
> of OS panic.
> So, customers in enterprise area never forgive us if kdump fails and
> we can't detect the root cause of panic due to lack of materials for
> investigation.
>
>>- Why do you need a notification from inside crash_kexec(). IOW, what
>> is the usage of KMSG_DUMP_KEXEC.
>
>
> The usage of kdump(KMSG_DUMP_KEXEC) in enterprise area is getting
> useful information for investigating kernel crash in case kdump
> kernel doesn't boot.
>
> Kdump kernel may not start booting because there is a sha256 checksum
> verified over the kdump kernel before it starts booting.
> This means kdump kernel may fail even if there is no bug in kdump and
> we can't get any information for detecting root cause of kernel crash

Sure it is theoretically possible that the sha256 checksum gets
corrupted (I have never seen it happen or heard reports of it
happening). It is a feature that if someone has corrupted your code the
code doesn't try and run anyway and corrupt anything else.

That you are arguing against have such a feature in the code you use to
write to persistent storage is scary.

> As I mentioned in [Background], We must avoid lack of materials for
> investigation.
> So, kdump(KMSG_DUMP_KEXEC) is very important feature in enterprise
> area.

That sounds wonderful, but it doesn't jive with the
code. kmsg_dump(KMSG_DUMP_KEXEC) when I read through it was simply not
written to be robust when most of the kernel is suspect. Making it in
appropriate for use on the crash_kexec path. I do not believe kmsg_dump
has seen any testing in kernel failure scenarios.

There is this huge assumption that kmsg_dump is more reliable than
crash_kexec, from my review of the code kmsg_dump is simply not safe in
the context of a broken kernel. The kmsg_dump code last I looked code
won't work if called with interrupts disabled.

Furthermore kmsg_dump(KMSG_DUMP_KEXEC) is only useful for debugging
crash_kexec. Which has it backwards as it is kmsg_dump that needs the
debugging.

You just argued that it is better to corrupt the target of your
kmsg_dump in the event of a kernel failure instead of to fail silently.

I don't want that unreliable code that wants to corrupt my jffs
partition anywhere near my machines.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/