Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

From: Nick Piggin
Date: Tue Jun 02 2009 - 10:08:19 EST


On Tue, Jun 02, 2009 at 09:30:19PM +0800, Wu Fengguang wrote:
> > No I mean the difference between the case of dirty page unable to
> > be written to backing sotre, and the case of dirty page becoming
> > corrupted.
>
> legacy EIO: may success on (do something then) retry?

Legacy EIO yes, I imagine most programs are assuming that the
cache is still the most recent (and valid) copy of the data.


> hwpoison EIO: a permanent unrecoverable error
>
> > They would presumably exit or do some default thing, which I
> > think would be fine. Actually if your code catches them in the
> > act of manipulating a corrupted page (ie. if it is mmapped),
> > then it gets a SIGBUS.
>
> That's OK. filemap_fault() returns VM_FAULT_SIGBUS for legacy EIO,
> while hwpoison pages will return VM_FAULT_HWPOISON. Both kills the
> application I guess?

Yes I was just using it to illustrate the difference. filemap_fault
does SIGBUS for read failures, sure, but if you msync and get an
EIO (legacy EIO), then it is not going to SIGBUS to all procs mapping
the page.


> read()/write() are the more interesting cases.

Yes.


> With read IO interception, the read() call will succeed.
>
> The write() call have to be failed. But interestingly writes are
> mostly delayed ones, and we have only one AS_EIO bit for the entire
> file, which will be cleared after the EIO reporting. And the poisoned
> page will be isolated (if succeed) and later read()/write() calls
> won't even notice there was a poisoned page!
>
> How are we going to fix this mess? EIO errors seem to be fuzzy and
> temporary by nature at least in the current implementation, and hard

Well that is a problem too. It is questionable how long to keep
legacy EIO reporting around (I'm of the opinion that we really
need to keep them around forever and wait for either truncate or
add a new syscall to discard them). But this is another discussion
because we already have these existing semantics, so little point
to quickly change them :)


> to be improved to be exact and/or permanent in both implementation and
> interface:
> - can/shall we remember the exact EIO page? maybe not.

If you add a new bit in the mapping, you could then call to the
error recovery code to do slowpath checking for overlapping page
offsets. It gets tricky if you want to allow the inode to be
reclaimed and still remember the errors ;)

> - can EIO reporting be permanent? sounds like a horrible user interface..

[Let's describe the ideal world:
We'd have EBADMEM that everyone knows about, and we have a syscall
that can clear these errors/bad pages. Maybe even another syscall
which can read back the contents of this memory without being SIGBUSed
or EBADMEMed.]

Now I have been of the the opinion that our current (legacy) EIO should
be permanent (unless the pages end up being able to be written back),
and we should have another syscall to clear this condition.

Unaware applications may have some difficulties, but a cmd line utility
can clear these so it could easily be recovered...

I think this might work for hwpoison as well (whether it ends up using
EIO or something else).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/