Re: [patch 0/9] kdump: Patch series for s390 support

From: Vivek Goyal
Date: Fri Jul 15 2011 - 10:19:24 EST


On Fri, Jul 15, 2011 at 03:56:21PM +0200, Michael Holzheu wrote:
> Hello Vivec,
>
> On Thu, 2011-07-14 at 13:55 -0400, Vivek Goyal wrote:
>
> [snip]
>
> > > The first thing we want to do is to check if
> > > the purgatory is still fine, that is do a checksum. If we have the
> > > infrastructure in place to do one checksum then we can easily do the
> > > other checksums as well.
> >
> > Some piece of code you have to assume is fine. Are you not already
> > assuming that IPL code you have in first 64K bytes is fine and no
> > body has overwritten it.
>
> We can assume that the IPL dump code is fine, because it is freshly
> loaded into memory. Only when the disk is somehow corrupted we have a
> problem.
>
> > Are you not assuming that hook in panic()
> > (I think you are calling it shutdown trigger) is fine so that it
> > can help you jump to right place.
>
> Yes, that is correct for automatic dump in case of panic(). The panic()
> path can fail.
>
> But there are two other options where really *no* code that was in
> memory, when the system crashed, is used for the dump process or
> verification of kdump:
> 1) Manual IPL/boot of stand-alone dump by the operator via the virtual
> guest console
> 2) Automatic IPL/boot of stand-alone dump by our z/VM hypervisor
> watchdog

Hi Michael,

Ok. So IIUC, then purgatory code corruption is equivalent of panic() code
corruption and in that case above two options will help an admin capture
the dump.

That's precisely the point I am trying to make that stand alone dump
tools still remains the backup mechanism when kdump fails. Kdump can
fail ether because checksum of loaded kernel is bad or because purgatory
code itself got corrupted. In first case, purgatory itself can make
sure of jumping to location to IPL the dump tools and in second case
above two options will come into picture (manual dump via operator or
hypervisor watchdog initiated IPL).

If we go this path, this will should simplify the design a lot. dump
tools don't have to know anything about kdump kernel and there is no
need to pass any information.

And in common case kdump should be able to capture the dump and filter
it. Only in extreme corner cases, we need to trigger this dump tool
mechanism and capture full memory dump.

How about doing it that way. This should not require much chagens in
common kexec code. Will require some changes in kexec-tools though,
as you shall have to create a mechanism for purgatory to jump to in
case kdump kernel checksum fails.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/