Re: [RFC][PATCH] ext3: don't read inode block if the buffer has a write error

From: Nick Piggin
Date: Mon Jun 23 2008 - 07:46:52 EST


On Monday 23 June 2008 21:25, Hidehiro Kawai wrote:
> A transient I/O error can corrupt inode data. Here is the scenario:
>
> (1) update inode_A at the block_B
> (2) pdflush writes out new inode_A to the filesystem, but it results
> in write I/O error, at this point, BH_Uptodate flag of the buffer
> for block_B is cleared and BH_Write_EIO is set
> (3) create new inode_C which located at block_B, and
> __ext3_get_inode_loc() tries to read on-disk block_B because the
> buffer is not uptodate
> (4) if it can read on-disk block_B successfully, inode_A is
> overwritten by old data
>
> This patch makes __ext3_get_inode_loc() not read the inode block if
> the buffer has BH_Write_EIO flag. In this case, the buffer should
> have the latest information, so setting the uptodate flag to the
> buffer (this avoids WARN_ON_ONCE() in mark_buffer_dirty().)
>
> According to this change, we would need to test BH_Write_EIO flag for
> the error checking. Currently nobody checks write I/O errors on
> metadata buffers, but it will be done in other patches I'm working on.

IMO there is a problem with all the buffer head and pagecache error
handling in that uptodate gets cleared on write errors. This is not
only wrong (because the in-memory copy continues to contain the most
uptodate copy), but it will trigger assertions all over the mm and
buffer code AFAIKS.

I don't know why it was done like this, or if anybody actually tested
any of it, but AFAIKS the best way to fix this is to simply not
clear any uptodate bits upon write errors.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/