Re: Kexec, DMA, and SMP

From: Eric W. Biederman (ebiederm@xmission.com)
Date: Mon Feb 17 2003 - 02:18:21 EST


Corey Minyard <minyard@acm.org> writes:

> An orderly shutdown will:
>
> ~ * claim locks and block as necessary
> ~ * free memory associated with the device
> ~ * flush device queues
> ~ * Fully shut down the device

Most of this is in required before a module is removed, but quite
a bit of this does not belong in the shutdown method. As user space,
and higher level code can down interfaces and remount hardware read
only.

So by the time it gets to the device driver for the final shutdown you
get:

- Locks are irrelevant because there are no other users (by
  definition).

- Queues are irrelevant as the device driver is not responsible for
  those. Someone needs to call sync, down the network interface,
  or the equivalent to push the last of the data through the device.

- Freeing memory is only needed if the kernel will persist, on a
  reboot or a halt that is not the case.
 
> An orderly shutdown should make sure the system remains sane after it
> finishes and the data on the device is correct.

But there are multiple layers to how that should be accomplished.
 
> A panic shutdown should only disable DMA with as little code as possible
> without locking, blocking, etc. No effort should be taken to keep the
> system sane (beyond clobbering memory), since it's not sane to begin with :-).
>
> You may want to say that this shutdown will be the panic shutdown and not be
> an orderly shutdown. That's fine, although I would suggest a name change.
> I couldn't find any documentation on what the shutdown call was supposed to do.

Currently the shutdown method is supposed to do the minimal amount needed to
place a device in a quiescent state before a reboot or a halt. And it
should optionally place the device in a state from which it can be
reinitialized by the driver later.

Essentially that is turning off DMA. And setting a few registers
so that the device can be restarted later. It explicitly does
not included freeing resources.

There may be some blocking waiting for the device, but I do not see
blocking waiting for other users of the device as correct. By
definition all other users are gone, so any locks in that code protect
nothing.

So I do not see how that differs, in any significant way from what you
figure the panic handler should do. Though I do admit I have
questions if the code would be reliable in a panic situation.

> |Because the kernel to handle the panic only initializes those devices
> |it can reliably initialize from any state. And it is living in an
> |area of memory the old kernel did not allow DMA to.
> |

> Are you sure this will be ok? I'm not sure either way. How much memory does
> a kernel take to boot up and operate for this? If it's a few meg, it's probably
>
> livable.
> If it's a lot of memory, it's probably not going to be acceptable.

Somewhere from 2-8Meg I would guess is sufficient on x86. It
primarily depends on the size of the dump kernel, and the user space.
8Meg used to be enough to run X.

> Plus, perhaps you would want to protect the output of the kernel dump somehow.
> That's going to be a lot more memory than you can reserve. And if you can shut
> off DMA, none of this should matter anyway.

Protect it? It does not need to be generated until after the new
kernel takes over. So the dump never needs to live in ram.

As for turning off DMA on a panic. It has already been shown that DMA
can not be reliably turned off in device independent way. Something
is definitely broken, and the odds are it is a driver. If there is
special panic code in a driver it likely to be the least tested code
path, if it exists at all. And with so many random devices out there
I do not see how it can be shown that they all have correct panic code.

I do not see shutting down all DMA as feasible, which is why I am
attempting to avoid the issue. With a reserved area of memory,
the only thing I am left to worry about is the unlikely case some
device doing DMA to an incorrect location that just happens to be
where the recovery kernel sits.

> The rest of what you said, about the panic kernel only taking the core dump and
> then rebooting, makes sense to me.

Good.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Feb 23 2003 - 22:00:16 EST