Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

From: Nix
Date: Thu Oct 25 2012 - 10:15:56 EST


On 25 Oct 2012, Theodore Ts'o stated:

> I've been thinking about this some more, and if you don't have a lot
> of time,

I've got time, but it's this weekend, not during the week :)

> perhaps the most important test to do is this. Does the
> chance of your seeing corrupted files in v3.6.3 go down if you run
> 3.6.3 with commit 14b4ed22a6 reverted?

This I can verify, sometime this evening. (I presume what we're really
interested in is whether the window in which files get corrupted has
narrowed such that my 5s sleep after umount is now long enough to have a
lower likelihood of corruption, since we know that a near-0s sleep after
umount causes corruption almost every time on 3.6.1 as well: I've now
done that three times and got corruption every time.)

> But most importantly, even if the bug doesn't show up with the default
> mount options at all (which explains why Eric and I weren't able to
> reproduce it), there are probably other users using nobarrier, so if
> the frequency with which you were seeing corruptions went up
> significantly between 3.6.1 and 3.6.3, and reverting 14b4ed22a6 brings
> the frequency back down to what you were seeing with 3.6.1, we should
> do that ASAP.

Agreed.

--
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/