RE: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01whenRT program dumps core]

From: Lee Revell
Date: Thu May 19 2005 - 11:16:25 EST


On Thu, 2005-05-19 at 10:37 -0400, Steven Rostedt wrote:
> On Thu, 2005-05-19 at 16:23 +0200, kus Kusche Klaus wrote:
> > Does that mean that the core dump is written
> > with the rt prio of the task which dumps?
> >
>
> Yes, since the process itself that crashed is what is writing the core.
> So if a RT process crashes, it writes the core as whatever it was.
>
> > I'm not sure if this is a good idea:
> > Dumping a big core might take *ages* (at least w.r.t. realtime),
> > especially because it usually goes to flash memory, a CF card,
> > or some other really slow device.
> >
>
> This is interesting, since if a RT task is dumping core, that usually
> means that it crashed, and therefore there's a bug in the system. Also,
> unless the processes is writing to something that requires a busy wait
> (which the serial might do, and probably some flashes), this shouldn't
> effect the system.

Interesting indeed. This could be caused by (possibly transient)
hardware failure as well as a bug. How do mission critical hard RT
applications typically handle disasters like the RT process dumping
core? Presumably you have a hardware or software watchdog, and drop
into some kind of safe mode. It seems that you would need redundant
systems if you wanted to continue to handle the RT constraint while
recovering.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/