Re: [PATCH 2/3] ext4: introduce ext4_error_remove_page

From: Theodore Ts'o
Date: Mon Oct 29 2012 - 14:25:02 EST


On Mon, Oct 29, 2012 at 03:37:05AM -0700, Andi Kleen wrote:
> > Agreed, if we're going to add an xattr, then we might as well store
>
> I don't think an xattr makes sense for this. It's sufficient to keep
> this state in memory.
>
> In general these error paths are hard to test and it's important
> to keep them as simple as possible. Doing IO and other complexities
> just doesn't make sense. Just have the simplest possible path
> that can do the job.

It's actually pretty easy to test this particular one, and certainly
one of the things I'd strongly encourage in this patch series is the
introduction of an interface via madvise and fadvise that allows us to
simulate an ECC hard error event. So I don't think "it's hard to
test" is a reason not to do the right thing. Let's make it easy to
test, and the include it in xfstests. All of the core file systems
are regularly running regression tests via xfstests, so if we define a
standard way of testing this function (this is why I suggested
fadvise/madvise instead of an ioctl, but in a pinch we could do this
via an ioctl instead).

Note that the problem that we're dealing with is buffered writes; so
it's quite possible that the process which wrote the file, thus
dirtying the page cache, has already exited; so there's no way we can
guarantee we can inform the process which wrote the file via a signal
or a error code return. It's also possible that part of the file has
been written out to the disk, so forcibly crashing the system and
rebooting isn't necessarily going to save the file from being
corrupted.

Also, if you're going to keep this state in memory, what happens if
the inode gets pushed out of memory? Do we pin the inode? Or do we
just say that it's stored into memory until some non-deterministic
time (as far as userspace programs are concerned, but if they are
running in a tight memory cgroup, it might be very short time later)
suddenly the state gets lost?

I think it's fair that if there are file systems that don't have a
single bit they can allocate in the inode, we can either accept
Jun'ichi's suggest to just forcibly crash the system, or we can allow
the state to be stored in memory. But I suspect the core file systems
that might be used by enterprise-class workloads will want to provide
something better.

I'm not that convinced that we need to insert an xattr; after all, not
all file systems support xattrs at all, and I think a single bit
indicating that the file has corrupted data is sufficient. But I
think it would be useful to at least optionally support a persistent
storage of this bit.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/