Re: [GIT PULL] Core block IO bits for 2.6.39 - early Oops

From: Theodore Tso
Date: Fri Mar 25 2011 - 07:59:13 EST



On Mar 25, 2011, at 12:41 AM, Dave Chinner wrote:

>>
>> It works insofar as the Oops is gone. But my xfs partitions apparently
>> still get corrupted (I had to run xfs_repair on several of them, because
>> they would not mount otherwise).
>
> So the patchset is causing repeatable filesystem corruption? Sounds
> to me like this series is not yet ready for mainline merging. Last
> thing I want to spend the .39 cycle helping people recover busted
> filesystems as a result of undercooked block layer changes...

FYI. I did a trial merge last night of the ext4 changes last night with
the tip of Linus's tree. The ext4 changes (based on 2.6.38-rc5)
survived xfstests -g auto before I merged in Linus's 2.6.39 master
branch. After I merged with 2.6.39-tip, I reran xfstests, and it got
past test #13 (fsstress), which normally means that everything is
OK, so I sent a pull request to Linus. Much later, (-g auto takes a
long time) I got an OOPS inside the virtio driver. Ext4 was nowhere
in the stack trace, but of course the block layer was. Grumbling
that someone had broke virtio during the merge window, I switched
my KVM setup to use SATA emulation and used the sda devices
instead. This time I got an oops in the block I/O layer, again quite
late in xfstests. Somewhere around test #224 or so if I remember
correctly.

It was too late last night to do any more investigating, which is why
I hadn't sent a formal report yet, but next up is for me to retry xfstests
before merging in my changes, and then to start a git bisect.

So before accusing some patch series which hasn't been merged
into 2.6.39 yet, you might want to also worry about some change
that already has been merged. Of course the symptoms for me are
quite different. I'm not seeing an early oops, but only something
which shows up when the the system is put under a lot of stress
by xfstests. So it could be a different problem....

- Ted

P.S. And of course there is the chance that there is some
subtle bug in the ext4 branch, which worked just fine when
it was just based on 2.6.38-rc5, but which only manifested
itself when I merged in the tip of Linus's branch. So I'm not
__accusing__ the block layer yet, even though the stack traces
seem to point that way, because I don't have a smoking gun
yet. But I do have to admit I'm suspicious....


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/