Re: [RFC Patch 0/2] Slimdump framework using CRASH_REASON - v2

From: K.Prasad
Date: Wed Nov 30 2011 - 12:16:06 EST


On Mon, Nov 28, 2011 at 09:24:02AM -0500, Vivek Goyal wrote:
> On Wed, Nov 23, 2011 at 11:03:18PM +0530, K.Prasad wrote:
> > On Mon, Nov 21, 2011 at 10:17:27AM -0500, Vivek Goyal wrote:
> > > On Mon, Nov 21, 2011 at 03:24:05PM +0530, K.Prasad wrote:
[snipped]
> >
> > The kernel message buffers can be obtained by using the --dump-dmesg
> > option of makedumpfile but again that's risky. We wouldn't know if it'll
> > cause access to the faulty memory (which is how the previous method of having
> > a new elf-notes in a pristine location is much safer).
> >
> > The method in this patch is quite primitive in that informs the user
> > nothing more than a one-line cause of crash. One should take help from other
> > tools (such as service processor/firmware/ACPI logs, or previous corrected
> > error logs) to infer the location of bad memory.
>
> And how does one get to firmware/ACPI logs? Many system don't have service
> processor also.
>
> I think extracting kernel buffers by default in case of MCE is reasonable.
> This should allow somebody to figure out some MCE related information.
>
> You might want to modify makedumpfile so that it does not try to access
> pages marked poisoned.
>

I'm not sure how easy or difficult it would be to skip hw-poisoned pages
from user-space i.e. makedumpfile. I'll start working on the relevant
changes though, and keep the community posted with the patches.

Thanks,
K.Prasad

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/