Re: [lkcd-devel] Re: What's left over.

From: Suparna Bhattacharya (suparna@in.ibm.com)
Date: Tue Nov 05 2002 - 06:42:30 EST


On Thu, Oct 31, 2002 at 07:37:05PM -0300, Werner Almesberger wrote:
> Jeff Garzik wrote:
> > That said, I used to be an LKCD cheerleader until a couple people made
> > some good points to me: it is not nearly low-level enough to truly be
> > of use in crash situations.
>
> I'm not so convinced about this. I like the Mission Critical
> approach: save the dump to memory, then either boot through the
> firmware or through bootimg (nowadays, that would be kexec),
> then retrieve the dump from memory, and do whatever you like
> with it.
>
> The huge advantage here is that you don't need a ton of
> specialized dump drivers and/or have much of the original kernel
> infrastructure to be in a usable state. The rebooted system will
> typically be stable enough to offer the full range of utilities,
> including up to date drivers for all possible devices, so you
> can safely write to disk, scp all the mess to your support
> critter, or post an automatic flame to linux-kernel :-)
>
> The weak points of the Mission Critical design are that early
> memory allocation in the kernel needs to be tightly controlled,
> that architectures that wipe CPU caches on reboot need to
> commit them to memory before the firmware restart, and that
> drivers need to be able to recover from an "unclean" hardware
> state. (I think we'll see much of the latter happen as kexec
> advances. The other two issues aren't really special.)
>
> Actually, at the RAS BOF I thought that IBM were developing LKCD
> in this direction, and had also eliminated a few not so elegant
> choices of Mission Critical's original design. I haven't looked

Yes, we are putting that in as one of the alternative dump targets
available. I have done quite a bit of work on that implementing the
ideas we talked about at OLS, and that's what I've been referring
to as the memory dump target. Its not quite ready yet and we
need something like kexec to be available which we can use on Intel
systems to achieve the softboot (the acceptance status of that still
doesn't seem to be clear), so I was looking at this as a
follow-on thing once the core infrastructure is there. More so
because we probably need to give it some time to stabilize and try
it on different environments and look at the issues you mention.

Why do we even consider the other options when we are doing
this already ? Well, as we discussed earlier there's non-disruptive dumps
for one, where this wouldn't work. The other is that before overwriting
memory we need to be able to stop all activity in the system for certain
(system may appear hung/locked up) and I'm not fully certain about
how to do this for all environments. Maybe an answer lies in
rethinking some parts of the algorithm a bit.

Also having the interface allows people to develop more specific/
reliable solutions for their environment. So we do not necessiate
code duplication, but if something exists, then the infrastructure
can use it.

The general feeling here is that a one solution fits all thing
may not work best right now ... and hence the focus on an interface
based approach that gives us the needed flexibility.

> at the LKCD code, but the descriptions sound as if all the
> special-case cruft seems to be back again, which I would find a
> little disappointing.

Hope that helps a bit.

Regards
Suparna

>
> There might be a case for specialized low-overhead dump handlers
> for small embedded systems and such, but they're probably better
> maintained outside of the mainstream kernel. (They're more like
> firmware anyway.)
>
> - Werner
>
> --
> _________________________________________________________________________
> / Werner Almesberger, Buenos Aires, Argentina wa@almesberger.net /
> /_http://www.almesberger.net/____________________________________________/
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: Influence the future
> of Java(TM) technology. Join the Java Community
> Process(SM) (JCP(SM)) program now.
> http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en
> _______________________________________________
> lkcd-devel mailing list
> lkcd-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lkcd-devel

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Nov 07 2002 - 22:00:37 EST