"Got md request ..." pointer ... Was: Re: First draft list of 2.3.x "Things to fix"

From: Florian Lohoff (flo@rfc822.org)
Date: Wed Jan 12 2000 - 06:04:05 EST


On Tue, Jan 11, 2000 at 12:51:06PM -0500, Kristofer T. Karas wrote:
> Alan Cox wrote:
>
> > The order is approximate priority. It may also not be totally accurate.
> > Suggestions/fixes welcome.
>
> Until some kind soul submits an elegant fix, I will take even a completely
> buggered fix for anything that makes it impossible to even run a 2.3.x kernel;
> the one that affects me (and anybody else out there???) is the MD driver:
> * "Got md request, not good..."
> I can't even boot up /sbin/init. :-(
> If RAID 0.90 addresses this, fine; but if not, or that isn't going in for
> awhile, then I would sure hope that the popular work-around floating about (or
> some facsimile) will be applied in the interim.

The Raid drivers dont address this problem - It is seem with the LVM
and the raid code (New and Old afaik).

There has been a dirty fix a couple of days ago ... I found it on deja.com

On 1999-12-17 Mike Galbraith <mikeg@weiden.de> wrote:
> On Thu, 16 Dec 1999, Richard Gooch wrote:
>
> > Hi, all. I've just noticed that MD (with raid0) is broken in
> > 2.3.33. I get the following message:
> > Got md request, not good...
> >
> > This seems to be because DEVICE_REQUEST() is being called (which is
> > defined to be do_md_request(). Perhaps the block device request queue
> > changes is to blame? I didn't have this problem with 2.3.31.
> >
> > Anyone got a patch to fix this?
>
> Not a patch, but I did get it working. Yesterday I beat the living _tar_ out of my box and it
> works perfectly.
>
> The trouble (afaikt) is two fold..
>
> make_request() pulls the wrong queue for a raid device via
> q = get_queue(bh->b_dev)
> and then does
> if( !q->plugged )
> (q->request_fn)(q);
> which leads directly to do_md_request().
>
> ll_rw_block() contains..
>
> for (i = 0; i < nr; i++) {
> set_bit(BH_Req, &bh[i]->b_state);
> #ifdef CONFIG_BLK_DEV_MD
> if (MAJOR(bh[i]->b_dev) == MD_MAJOR) {
> md_make_request(bh[i], rw);
> continue;
> }
> #endif
> __make_request(q, MAJOR(bh[i]->b_rdev), rw, bh[i]);
> }
>
> .. ie it wants to substitute it's own handling where there is a
> handler (raid1/5), but in md_make_request() at the point where
> you fall through, you hit make_request() instead of the intended __make_request().
>
> What I did (because get_queue() and __make_request() are static) was to clone make_request() and
> call it make_request_redirect(). It is identical to make_request except it does..
> q = get_queue(bh->b_rdev) << remember this 'difference'
>
> The workaround (bandaid) in md_make_request() looks like this.
> err = mddev->pers->make_request(mddev, rw, bh);
> } else {
> extern void make_request_redirect(int,int,struct buffer_head *); make_request_redirect
> (MAJOR(bh->b_rdev), rw, bh);
> err = 0;
> }
>
> Comment out the printk in do_md_request(), apply the same bandaid to raid1.c and raid5.c and
> you're done.
>
> I have one tiiiiny little problem with this 'fix'. If you look at ll_rw_block() a few lines up
> from md_make_request(), you'll see.. /* Md remaps blocks now */
> bh[i]->b_rdev = bh[i]->b_dev;
>
> I <zzzt> don't get it, but <zzzt> it works. I'll wait for the movie.
>
> -Mike

This fix is probably only a pointer to the right direction

This problem basically prevents us from using raid0s of raid5s which
i tried (but failed). It occures with the LVM under high load as
Heinz told me. 2.2 is also affected.

Flo

-- 
Florian Lohoff		flo@rfc822.org		      	+49-5241-470566
 ...  The failure can be random; however, when it does occur, it is
 catastrophic and is repeatable  ...             Cisco Field Notice

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:20 EST