Re: Errors and later panics in 2.6.0-test11.

From: Neil Brown
Date: Wed Dec 03 2003 - 15:12:13 EST


On Wednesday December 3, torvalds@xxxxxxxx wrote:
>
>
> On Wed, 3 Dec 2003, Jens Axboe wrote:
> > >
> > > Interesting. Another RAID 0 problem report..
> >
> > Hmm did _all_ reports include raid-0, or just "some" raid? I'm looking
> > at the bio_pair stuff which raid-0 is the only user of, something looks
> > fishy there.
>
> The ones I've seen seem to be raid-0, but Nathan (nathans@xxxxxxx)
> reported problems in RAID-5 under load. I didn't decode the full oops on
> that one, but it really looked like a stale "bi" bio that trapped on the
> PAGE_ALLOC debug code.
>

Nathan's had a second oops that turned out to be a bi_next pointer
being bad in a bio that raid5 had just about finished writing out.
So there does seem to be something wrong with bio handling, quite
possibly in raid5.

The only thing I could find was that if raid5 received two overlapping
bios concurrently (or atleast received the second before it had
finished with the first) it could get confused. I've asked Nathan to
try a patch that BUGs when that happens.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/