Re: [PATCH 0/5] [RFC] HWPOISON incremental fixes

From: Andi Kleen
Date: Fri Jun 12 2009 - 06:48:31 EST


On Thu, Jun 11, 2009 at 10:22:39PM +0800, Wu Fengguang wrote:
> Hi all,
>
> Here are the hwpoison fixes that aims to address Nick and Hugh's concerns.
> Note that
> - the early kill option is dropped for .31. It's obscure option and complex
> code and is not must have for .31. Maybe Andi also aims this option for
> notifying KVM, but right now KVM is not ready to handle that.

KVM is ready to handle it, patches for that have been submitted and
are queued.

Also without early kill it's not really possible right now to recover
in the guest. Also for some other scenarios early kill is much easier
to handle than late kill: for late kill you always have to bail
out of your current execution context, while early kill that can be
done out of line (e.g. by just dropping a corrupted object similar to
what the kernel does). That's a much nicer and gentle model than late
kill.

Of course very few programs will try to handle this, but if any does
it's better to make it easier for them.

That we send too many signals in a few cases is not fatal right now
I think. Remember always the alternative is to die completely.

So please don't drop that code right now.


> - It seems that even fsync() processes are not easy to catch, so I abandoned
> the SIGKILL on fsync() idea. Instead, I choose to fail any attempt to
> populate the poisoned file with new pages, so that the corrupted page offset
> won't be repopulated with outdated data. This seems to be a safe way to allow
> the process to continue running while still be able to promise good (but not
> complete) data consistency.

The fsync() error reporting is already broken anyways, even without hwpoison,
for metadata errors which also only rely on the address space bit and not the
page and run into all the same problems.

I don't think we need to be better here than normal metadata.

Possibly if metadata can be fixed then hwpoison will be fixed too in the
same pass. But that's something longer term.

> - I didn't implement the PANIC-on-corrupted-data option. Instead, I guess
> sending uevent notification to user space will be a more flexible scheme?

Normally you can get very aggressive panics by setting the x86 mce tolerant
modus to 0 (default is 1); i suspect that will be good enough.

If other architectures add hwpoison support presumably they can add
a similar tunable.

Doing that in the low level handler is better than in the high level
VM because there are some corruption cases which are not reported
to high level (e.g. not affecting memory directly)

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/