Re: [PATCH 00/22] HWPOISON: Intro (v5)

From: Nick Piggin
Date: Mon Jun 15 2009 - 08:25:45 EST


On Mon, Jun 15, 2009 at 08:10:01PM +0800, Wu Fengguang wrote:
> On Mon, Jun 15, 2009 at 03:19:07PM +0800, Nick Piggin wrote:
> > > For KVM you need early kill, for the others it remains to be seen.
> >
> > Right. It's almost like you need to do a per-process thing, and
> > those that can handle things (such as the new SIGBUS or the new
> > EIO) could get those, and others could be killed.
>
> To send early SIGBUS kills to processes who has called
> sigaction(SIGBUS, ...)? KVM will sure do that. For other apps we
> don't mind they can understand that signal at all.

For apps that hook into SIGBUS for some other means and
do not understand the new type of SIGBUS signal? What about
those?


> > Early-kill for KVM does seem like reasonable justification on the
> > surface, but when I think more about it, I wonder does the guest
> > actually stand any better chance to correct the error if it is
> > reported at time T rather than T+delta? (who knows what the page
> > will be used for at any given time).
>
> Early kill makes a lot difference for KVM. Think about the vast
> amount of clean page cache pages. With early kill the page can be
> trivially isolated. With late kill the whole virtual machine dies
> hard.

Why? In both cases it will enter the exception handler and
attempt to do something about it... in both cases I would
have thought there is some chance that the page error is not
recoverable and some chance it is recoverable. Or am I
missing something?

Anyway, I would like to see a basic analysis of those probabilities
to justify early kill. Not saying there is no justification, but
it would be helpful to see why.

Thanks,
Nick


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/