Re: Error testing ext3 on brd ramdisk

From: Nick Piggin
Date: Wed Mar 18 2009 - 08:15:53 EST


On Tue, Mar 17, 2009 at 11:40:19AM +0200, Denis Karpov wrote:
> Hello,
>
> first off, sorry if you getting this email twice.

No problem, I'm not exactly able to reproduce it myself, but Jan Kara
has just fixed some issues which could explain it: they happen under
memory pressure so I may not have triggered it if I didn't put it
under pressure.

Jan's fixes are here:

http://marc.info/?l=linux-ext4&m=123731584711382&w=2

It would be interesting to try them, and if they don't work maybe
he's also interested so I cc'ed him.


> I also tried to do ext3/ext4 fs smoketesting and used Adraian's
> script. I am consistently getting the same results - filesystem get's
> corrupted.
> I tested on quad Xeon, with patches posted in this thread.
>
> 1. tests with brd:
> - ext3fs on brd
> corruption (see attached ext3fs.brd.corruption.txt)
> - ext4fs on brd
> corruption (see attached ext4fs.brd.corruption.txt)
>
> In both cases I saw some complains from JBD/JBD2:
> JBD: Detected IO errors while flushing file data on
>
> 2. I enabled JBD debugging, re-run the tests. Console was
> flooded with messages and in the end I got a soft lockup.
> I cannot consistently reproduce this (see attached
> brd.ext3fs.softlock.txt).
>
> Just to be sure I re-run the tests on real block device (usb stick)
>
> 3. tests with real block device (usb stick)
> - ext3fs
> no fs currption (overnight run)
> - ext4fs
> no fs currption (overnight run)

It's possible the real block device is not fast enough to trigger
it, or different timings don't trigger it (brd requests complete
immediately wheras real devices tend to complete afterwards,
from (soft)interrupt context).

Or it could be that brd is consuming some more memory to push
the system into reclaim and exposing those bugs Jan has fixed...


> Any ideas what else can be done here? I'd like to find out if this is
> filesystem or brd related fault.

Yes, thanks for persisting. If you can test the patches and see
if they help? If not, does ext2 show corruption? How about ext3
on loop device (with backing file from tmpfs/ramfs for speed).

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/