Re: [IBM] RE: [BUG] Fusion MPT Base Driver initialization failurewit h kdum p

From: Andrew Morton
Date: Mon Jul 25 2005 - 17:14:10 EST


Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> > If you don't stop the DMA engines before you boot the new kernel, the
> > addresses they have to send data to will now be random points in that
> > kernel's memory, leading to potential corruption of the new kernel
> > image.
>
> [Copying it to fastboot and linux-kernel mailing lists]
>
> We are booting second kernel (capture kernel) from a reserved memory location
> to take care of on-going DMA issues. So even if some DMA transactions are going
> on after the crash they will not corrupt the new kernel.
>
> >
> > The interrupt panic of the fusion is probably a symptom of this: I bet a
> > DMA transfer has just completed and the interrupt is to inform us of
> > this (however, in the new kernel we're not expecting any transfers).
>
> That might very well be the case. So driver should simply ignore the interrupt
> when it is not expecting it or it should reset the device if it finds that
> some interrupts are pending when it should not have been there.
>
> Basically it is a matter of hardening the driver so that it can handle/
> initialize the device even if the device is not in reset state.

I'd expect that a lot of these problems could be reduced by simply pausing
for a while in the crash handler, wait for I/O to complete.

Other times these failures point at flaws and dubious assumptions in
drivers themselves, fixing of which tends to collaterally help platform
power mangement operations.

However, I expect that it will always be the case that setups which try to
perform crashdumps to their main production disks will be less reliable
than setups which set aside hardware for the dumping.

If it was me, and if I really cared about reliable dumps, I'd make sure
that each machine had a $6 NIC set aside for crashdumping, and the dump
kernel uses that NIC for an NFS dump and has no other device drivers
configured.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/