Re: [PATCH] ehea: Add kdump support

From: Michael Ellerman
Date: Mon Nov 26 2007 - 03:16:46 EST


On Fri, 2007-11-09 at 14:33 +0100, Thomas Klein wrote:
> To support ehea driver reloading in a kdump kernel the driver has to perform
> firmware handle deregistrations when the original kernel crashes. As there's
> currently no notifier chain for machine crashes this patch enables kdump support
> in the ehea driver by bending the ppc_md.machine_crash_shutdown hook to its own
> machine crash handler. The original machine_crash_shutdown() fn is called
> afterwards. This works fine as long as the ehea driver is the only one which
> does so. Problems may occur if other drivers do the same and unload regularly.
> This patch enables 2.6.24-rc2 to use kdump with ehea and only puts a very
> low risk on base kernel. In 2.6.24 we know ehea is the only user of this
> mechanism. The next step for 2.6.25 would be to add a proper notifier chain.
> The full solution might be that register_reboot_notifier() provides sth
> like a SYS_CRASH action. Please apply.
>
> Signed-off-by: Thomas Klein <tklein@xxxxxxxxxx>
>
> ---
> drivers/net/ehea/ehea.h | 2 +-
> drivers/net/ehea/ehea_main.c | 28 ++++++++++++++++++++++++++++
> 2 files changed, 29 insertions(+), 1 deletions(-)
>

Hi Thomas,

I'm sorry, but this patch is all wrong IMHO.

For kdump we have to assume that the kernel is fundamentally broken,
we've panicked, so something bad has happened - every line of kernel
code that is run decreases the chance that we'll successfully make it
into the kdump kernel.

So just calling unregister_driver() is no good, that's going to call
lots of code, try to take lots of locks, rely on lots of data structures
being uncorrupted etc. etc.

And the hijacking of machine_crash_shutdown() is IMO not an acceptable
solution, as you say it only works if EHEA is the only driver to do it.
And as soon as EHEA does it other drivers will want to do it.

What we need is the _minimal_ set of actions that can happen to make
EHEA work in the kdump kernel.

Solutions that might be better:

a) if there are a finite number of handles and we can predict their
values, just delete them all in the kdump kernel before the driver
loads.
b) if there are a small & finite number of handles, save their values
in a device tree property and have the kdump kernel read them and
delete them before the driver loads.
c) if neither of those work, provide a minimal routine that _only_
deletes the handles in the crashed kernel.
d) <something else>

cheers

--
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

Attachment: signature.asc
Description: This is a digitally signed message part