Re: RAID5 bug in 2.1.103

Richard Jones (rjones@orchestream.com)
Mon, 01 Jun 1998 16:08:20 +0000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Kenneth Albanowski: "Schedule within block request routines?"
Previous message: Wayne J. Salamon: "TCP throughput"
Maybe in reply to: Mike Black: "RAID5 bug in 2.1.103"

Mike Black wrote:
>
> Had 2.1.103 running for 10 days and started seeing these this
> morning...anybody care to comment...fixes...etc?
>
> Jun 1 10:13:10 medusa kernel: raid5: bug: stripe->bh_new[0], sector 1614216
> exists
> Jun 1 10:13:10 medusa kernel: raid5: bh c2e64c80, bh_new c25e8c80

Yes ... been there, seen that!

It seems to be caused by a bug in the higher layers in the
kernel passing two buffer heads (the `bh' pointers) relating
to the same block on disk. My investigations have led me to
believe that the bug is in `ext2_truncate', perhaps a race
condition when two processes execute this function + some
other in parallel with certain specific parameters to the
truncate syscall. I've still not found the bug, though, nor
a minimal program which exposes it (but I am trying :-)

Anyway, Gadi Oxman who wrote much of the original RAID code
has a patch which works really well. (It patches the symptoms,
not the underlying cause). His patch and message is attached.

Rich.

Subject: Re: RAID5 (or ext2_truncate?) bug
Date: Tue, 21 Apr 1998 22:13:52 +0400 (IDT)
From: Gadi Oxman <gadio@netvision.net.il>
To: Richard Jones <rjones@orchestream.com>, Linus Torvalds <torvalds@transmeta.com>,
MOLNAR Ingo <mingo@chiara.csoma.elte.hu>, Miguel de Icaza <miguel@nuclecu.unam.mx>,
"Theodore Y. Ts'o" <tytso@MIT.EDU>

Hi,

> Gadi:
>
> We just got a message from your patch ...
>
> raid5: bug: stripe->bh_new[1], sector 13749952 exists
> raid5: bh cfb37720, bh_new c9119620
>
> The NFS server kept on running this time. Thanks :-)
>
> However, of course we have no stack trace, so we don't
> know if it hit the (possible) bug in ext2_truncate, or
> if it was caused by something else. Nevertheless, the timing
> and symptoms were very similar to what happened previously.
>
> I can now fairly reliably reproduce the report by:
>
> . having lots of NFS traffic
> . running htmerge (the final stage of the htdig
> search program)
>
> At some point during the htmerge, htmerge spawns off
> a sort process, and somewhere around this time we hit
> the bug.
>
> Of course, HTDIG takes about 6 hours to run, so it's not
> a great test ...
>
> Rich.

Thanks for the report Rich; looks like we are finally about to resolve
this long-standing cause of crashes with the RAID-5 driver :-)

The fact that bh != bh_new seems to confirm our assumption that
at least some cases of that problem are not caused directly by RAID-5
driver -- we are receiving two I/O requests for the same disk buffer,
pointed to by two different memory buffers simultaneously.

The old RAID-5 behavior simply issued a warning, forgot about the old
buffer, and serviced only the new one, which probably lead to the processes
which were stuck in the 'D' state, waiting on the old buffer.

Linus, I appended the RAID-5 patch below; the new behavior waits for
I/O on the first buffer to complete, and then services the new buffer,
which is more or less the behavior of a standard non-RAID device.

However, receiving two different buffer heads which point to the
same disk buffer simulatenously seems to indicate a potential problem
in the higher levels of the kernel.

Last time it happened, Rich was running with Ingo's patch which forced
a stack trace at this point, and it always happened in combination
with truncate():

Using `/System.map-2.1.88' to map addresses to symbols.

>>EIP: d0815230 cannot be resolved
Trace: d0816f18
Trace: d0816ee5
Trace: d0815f23
Trace: c0164f0b <md_make_request+93/b0>
Trace: c01663e1 <ll_rw_block+169/1e0>
Trace: c0138888 <ext2_bread+3c/7c>
Trace: c013ca0e <ext2_truncate+e6/16c> <-***---
Trace: c0121b22 <do_truncate+56/70>
Trace: c0121c3f <sys_truncate+103/130>
Trace: c0109926 <system_call+3a/40>
Code:
Code: c7 05 00 00 00 movl $0x0,0x0
Code: 00 00 00 00 00
Code: 83 c4 10 addl $0x10,%esp
Code: b8 02 00 00 00 movl $0x2,%eax
Code: 8b 55 00 movl 0x0(%ebp),%edx
Code: 90 nop
Code: 90 nop
Code: 90 nop

Here is the RAID-5 patch:

Thanks,

Gadi

--- linux/drivers/block/raid5.c.old Tue Mar 31 17:39:19 1998
+++ linux/drivers/block/raid5.c Tue Mar 31 17:41:10 1998
@@ -1209,6 +1209,14 @@
sh->pd_idx = pd_idx;
if (sh->phase != PHASE_COMPLETE && sh->phase != PHASE_BEGIN)
PRINTK(("stripe %lu catching the bus!\n", sh->sector));
+ if (sh->bh_new[dd_idx]) {
+ printk("raid5: bug: stripe->bh_new[%d], sector %lu exists\n", dd_idx, sh->sector);
+ printk("raid5: bh %p, bh_new %p\n", bh, sh->bh_new[dd_idx]);
+ lock_stripe(sh);
+ md_wakeup_thread(raid_conf->thread);
+ wait_on_stripe(sh);
+ goto repeat;
+ }
add_stripe_bh(sh, bh, dd_idx, rw);

md_wakeup_thread(raid_conf->thread);

-- Richard Jones rjones@orchestream.com Tel: +44 171 598 7557 Fax: 460 4461 Orchestream Ltd. 125 Old Brompton Rd. London SW7 3RP PGP: www.four11.com "boredom ... one of the most overrated emotions ... the sky is made of bubbles ..." Original message content Copyright © 1998

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu

Next message: Kenneth Albanowski: "Schedule within block request routines?"
Previous message: Wayne J. Salamon: "TCP throughput"
Maybe in reply to: Mike Black: "RAID5 bug in 2.1.103"