Re: raid5 crash

From: Stephen C. Tweedie
Date: Thu Jan 13 2005 - 10:02:28 EST


Hi,

On Wed, 2004-12-22 at 23:08, Neil Brown wrote:

> > kernel BUG at drivers/md/raid5.c:813!
>
> This BUG happens when there are two outstanding read (or write)
> requests for the same piece of storage (more accurately, two "bio"s
> that overlap).

Ouch.

> raid5 cannot currently handle this situation.
> Most filesystems would never make requests like this.

I don't think there's anything to stop me doing two O_DIRECT read()s
from the same bit of a file at the same time. As far as I can see, this
should be easily triggered by user-space.

And if you get corruption in a filesystem such that two files share the
same block, then this possibility arises again. That can happen due to
corruption in an indirect block (ext2/3) or in the reiserfs tree; or
more commonly due to a bitmap block getting corrupted, resulting in the
same block being allocated twice.

This is a situation we really need to handle. ext3 goes to great
lengths to make sure that if such cases happen, the worst that results
should be the filesystem taking itself readonly cleanly.

It's really bad behaviour for a fault such as a bad IDE cable to be able
to oops the entire kernel.

> I doubt very much that this would happen with ext3.

It certainly shouldn't do so for buffered IO if things are running fine,
but as soon as you get corrupt data, or start using O_DIRECT, it's
possible.

Cheers,
Stephen


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/