Re: [PATCH] panic.c: export panic_on_oops

From: Ingo Molnar
Date: Mon Oct 12 2009 - 09:17:15 EST



* Simon Kagstrom <simon.kagstrom@xxxxxxxxxxxxxx> wrote:

> On Mon, 12 Oct 2009 14:20:23 +0200
> Ingo Molnar <mingo@xxxxxxx> wrote:
>
> > * David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> >
> > > On Mon, 2009-10-12 at 14:09 +0200, Ingo Molnar wrote:
> > > > Also, would it be possible to just simplify the thing and not do any
> > > > buffering at all? Extra buffering complexity in a console driver is only
> > > > asking for trouble. Or is flash storage write cycles optimization that
> > > > important in this case?
> > >
> > > That and the fact that on NAND flash you have to write full pages at a
> > > time -- that's 512 bytes, 2KiB or 4KiB depending on the type of chip.
> > > So we really do want to buffer it where we can.
> > >
> > > We don't want to write a 2KiB page for every line of printk output.
> >
> > Then i think the buffering is at the wrong place: we should instead
> > buffer in the generic layer and pass it to lowlevel if we know that we
> > have gone past a 2K boundary.
> >
> > The size of the generic log buffer is always a power of two so
> > detecting 2K boundaries is very easy. On any emergency the generic
> > console layer will do faster flushes - this is nothing the console
> > driver itself should bother with.
>
> But this is only part of the mtdoops problem (the reason why we don't
> write all the time). The current code only stores messages printed
> during an oops, and this behavior will surely change if the console
> driver gets large buffers of output - or it would have to take in the
> output unbuffered anyway.
>
> My patch changes this behavior, and with that I don't think buffered
> output would be a problem - it would indeed make it more simple as you
> say - assuming there is something like ->kernel_bug() that would flush
> the last 4KiB or so of messages to mtdoops when there is an oops or
> panic.
>
> > And that would avoid the whole workqueue logic - which is fragile to
> > be done in a printk to begin with.
>
> I'm afraid I don't really see this issue. The workqueue is used to
> write the buffer to the mtd device if we are not in a panic or
> interrupt context - in which case we do it directly.
>
> So it's only used when an oops is ongoing.

This fixation on 'panic' is so wrong!

90% of the bugs users care about dont involve any panic. And even if
there is a panic down the line, most of the interesting messages are in
the stream leading up to the panic - now tucked away in that async
workqueue mechanism and not visible.

There's two clean solutions i think:

1) add some new "ok, there's trouble!" callback to struct console and
the console driver could via that mechanism send out the _last_ 2KB
(or more) of kernel log messages. Basically we can go back in time by
looking at the dmesg buffer. The low level console driver does not
need to 'follow' the high level console state - it only wants to
print in case of trouble anyway.

2) or add buffered (flash-friendly) writes for all printk output - panic
and non-panic alike. This would be useful to debug suspend/resume
bugs for example. This would also optimize the packets of netconsole
output. (last i checked we sent a packet per line.)

The workqueue looks wrong in both variants. If we are panic-ing (or
hanging, or ...) then we are halting the machine - the workqueue has no
chance to actually execute.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/