Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

From: Nick Piggin
Date: Tue Jun 02 2009 - 09:47:52 EST


On Tue, Jun 02, 2009 at 03:46:10PM +0200, Andi Kleen wrote:
> On Tue, Jun 02, 2009 at 03:19:37PM +0200, Nick Piggin wrote:
> > > I assume that if an application does something with EIO it
> > > can either retry a few times or give up. Both is ok here.
> >
> > That's exactly the case where it is not OK, because the
> > dirty page was now removed from pagecache, so the subsequent
> > fsync is going to succeed and the app will think its dirty
> > data has hit disk.
>
> Ok that's a fair point -- that's a hole in my scheme. I don't
> know of a good way to fix it though. Do you?
>
> I suspect adding a new errno would break more cases than fixing
> them.

Right, I wasn't too serious about the new errno (although maybe
others have opinions about the feasibility of that?). Because
I just don't know the full consequences.

I was kind of thinking about we could SIGKILL them as they try
to access it or fsync it. But then the question is how long to
keep SIGKILLing? At one end of the scale you could do stupid
and simple and have another error flag in the mapping to do
the SIGKILL just once for the next read/write/fsync etc. Or
at the other end, you keep the page in the pagecache and
poisoned, and kill everyone until the page is explicitly truncated
by userspace. I don't really know...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/