Re: Kexec, DMA, and SMP

From: Kenneth Sumrall (
Date: Mon Feb 10 2003 - 20:35:39 EST

"Eric W. Biederman" wrote:
> Suparna Bhattacharya <> writes:
> > On Sun, Feb 09, 2003 at 11:39:27AM -0700, Eric W. Biederman wrote:
> > > Corey Minyard <> writes:
> > >
> > > With respect to DMA and SMP handling for kexec on panic that case is
> > > much trickier. A lot of the normal methods simply don't apply because
> > > by definition in a panic something is broken, and that something may
> > > be the code we need to cleanly shutdown the hardware. But I an not
> > > ready to sacrifice a method that works well in a properly working
> > > kernel just because the panic case can't use it.
> > >
> > > In getting it working I suggest we start with the easy cases, where
> > > DMA and SMP are not big issues. And then we can have a working
> > > framework.
> >
> > I'd agree. That was also the idea behind the patch we'd just posted
> > for LKCD. With a basic working framework in hand that works for
> > simpler cases, we can now keep working on addressing more and harder
> > situations bit by bit.
> Agreed. I guess the primary question is can we trust the current
> device shutdown + reboot notifier path or do we need to make some
> large changes to avoid it.
So are the functions registered on the reboot notifier path guaranteed
to be non-blocking? In the kexec on panic case, calls that can block
would obviously be a bad thing. If they can block, perhaps we could add
a new flag SYS_PANIC or something like that to tell the driver to only
do a non-blocking shutdown of the chip.

> > Are you trying to address the possibility that DMA is overwriting
> > memory we are using in the recovery code, due to a runaway driver
> > or other code passing a wrong memory address to a device (e.g. in
> > a corrupted command area) ?
> Not primarily. Instead I am trying to address the possibility that
> DMA is overwriting the recovery code due to a device not being shutdown
> properly. Though it would happen to cover many cases of the wrong
> memory address being passed to a device.
The problem we were seeing was that rogue DMA from a network interface
chip was corrupting dentry's in the dirent cache when the rebooted
kernel was coming back up. This caused a whole new set of panics. :-(
Ken Sumrall
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sat Feb 15 2003 - 22:00:31 EST