Re: [PATCH 2/3] ext4: introduce ext4_error_remove_page

From: Andi Kleen
Date: Mon Oct 29 2012 - 15:06:57 EST


Theodore Ts'o <tytso@xxxxxxx> writes:
>
> It's actually pretty easy to test this particular one,

Note the error can happen at any time.

> and certainly
> one of the things I'd strongly encourage in this patch series is the
> introduction of an interface via madvise

It already exists of course.

I would suggest to study the existing framework before more
suggestions.

> simulate an ECC hard error event. So I don't think "it's hard to
> test" is a reason not to do the right thing. Let's make it easy to

What you can't test doesn't work. It's that simple.

And memory error handling is extremly hard to test. The errors
can happen at any time. It's not a well defined event.
There are test suites for it of course (mce-test, mce-inject[1]),
but they needed a lot of engineering effort to be at where
they are.

[1] despite the best efforts of some current RAS developers
at breaking it.

> Note that the problem that we're dealing with is buffered writes; so
> it's quite possible that the process which wrote the file, thus
> dirtying the page cache, has already exited; so there's no way we can
> guarantee we can inform the process which wrote the file via a signal
> or a error code return.

Is that any different from other IO errors? It doesn't need to
be better.

> Also, if you're going to keep this state in memory, what happens if
> the inode gets pushed out of memory?

You lose the error, just like you do today with any other IO error.

We had a lot of discussions on this when the memory error handling
was originally introduced, that was the conclusuion.

I don't think a special panic knob for this makes sense either.
We already have multiple panic knobs for memory errors, that
can be used.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/