Re: [PATCH 3/3] HWPOISON: improve handling/reporting of memory error on dirty pagecache

From: Naoya Horiguchi
Date: Fri Aug 10 2012 - 21:01:25 EST


Hello,

On Fri, Aug 10, 2012 at 04:13:03PM -0700, Andi Kleen wrote:
> Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> writes:
>
> > Current error reporting of memory errors on dirty pagecache has silent
> > data lost problem because AS_EIO in struct address_space is cleared
> > once checked.
>
> Seems very complicated. I think I would prefer something simpler
> if possible, especially unless it's proven the case is common.
> It's hard to maintain rarely used error code when it's complicated.

I'm not sure if memory error is a rare event, because I don't have
any numbers about that on real systems. But assuming that hwpoison
events are not rare, dirty pagecache error is not an ignorable case
because dirty page ratio is typically ~10% of total physical memory
in average systems. It may be small but not negligible.

> Maybe try Fengguang's simple proposal first? That would fix other IO
> errors too.

In my understanding, Fengguang's patch (specified in this patch's
description) only fixes memory error reporting. And I'm not sure
that similar appoarch (like making AS_EIO sticky) really fixes
the IO errors because this change can break userspace applications
which expect the current behavior.

Anyway, OK, I agree to start with Fengguang's one and separate
out the additional suggestion about "making dirty pagecache error
recoverable". And if possible, I want your feedback about the
additional part of my idea. Can I ask a favor?

Thanks,
Naoya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/