Re: NFS question and NFS GPF PROBLEM

Michel LESPINASSE (walken@Studio.via.ecp.fr)
20 Jan 1997 18:14:34 GMT


Wolfram.Gloger wrote :

> > Can someone explain me how req->rq_wait is initialised in the NFS
> > filesystem code ? The only place where I can see it beeing touched is
> > in nfsiod(), when the whole req structure is memset to zero.
> >
> > >From then on, the only place where I can see rq_wait explicitely used
> > in the kernel (I looked with a big grep) is as arguments to wake_up or
> > interruptible_sleep_on....
> >
> > Probably I'm just missing something <BIG>
> >
> > In do_read_nfs_async, most of the req structure is initialized, but not
> > the rq_wait field. God I'm confused.

> This is OK, I think, because do_read_nfs_async only gets req structures
> that have been cleared in nfsiod(), like you've said above.

That's also what I understood when I read the source, but this doesn't
correspond to what I see in my GPF dumps :

Code: 110ad4 <wake_up+2c/e4> movl (%ebx),%edx
Code: 110ad6 <wake_up+2e/e4> movl 0x4(%ebx),%ebx
Code: 110ad9 <wake_up+31/e4> testl %edx,%edx

This ksymoops extract shows us that gcc has placed next in the ebx register.

GPF #1 : eax: 01ffae28 ebx: ffffffff ecx: 01ffae28 edx: 0000030b
GPF #2 : eax: 00010e28 ebx: 6e692f72 ecx: 00010e28 edx: 0000030b

Apparently, in wake_up we have next != NULL. But next was loaded with
req->rq_wait, so it must have been touched somewhere ???

That's what I don't understand : I cannot find where.

> > usualy do a lot of nfs accesses until "someday" I get a GPF. After this
> > GPF, the results can be varying : sometimes the reading process gets
> > locked and I cannot kill -9 it (despite of the intr flag used at mount
> > time), sometimes the reading process gets killed. I also once got the
> > "wait_queue is bad" message while I was running the 2.0.27 kernel.
> > Sometimes my load average goes up to one, sometimes not.

> I've seen exactly this ! The following patch by Olaf Kirch (which
> will hopefully be in 2.0.29) fixed it for me:

[ patch not quoted ]

I'll go back to my rsize=8192 configuration (the instability was solved when
I removed this parameter) and try this patch. I'll keep you informed if this
does not solve the problem.

Best regards,

Michel "Walken" LESPINASSE - Student at Ecole Centrale Paris (France)
www Email : walken@via.ecp.fr
(o o) VideoLan project : http://videolan.via.ecp.fr/
------oOO--(_)--OOo-------------------------------------------------------
Yow ! 1135 KB/s remote host TCP bandwidth over 10Mb/s ethernet. Beat that!