SMP NFS root problem in 2.1.117

Jes Sorensen (Jes.Sorensen@cern.ch)
Thu, 20 Aug 1998 16:04:14 +0200 (MET DST)


Hi

I just received a report about this from a colleague here at CERN. There
seem to have an SMP problem with NFS root in the latest kernel
versions. The problem occurs when the NFS root filesystem is being
mounted.

I guess this is something that needs to be looked at before 2.2.

Any ideas?

Jes
------- start of forwarded message -------
From: Steffen Luitz <Steffen.Luitz@cern.ch>
To: Jes Sorensen <Jes.Sorensen@cern.ch>
Subject: More information ...
Date: Thu, 20 Aug 1998 15:49:36 +0200 (MET DST)

Hi Jes,

our hardware configuration:

Tyan TahoeII, 2 x PII-266, 192 MByte, 2 x Adaptec 2940UW, 1 x EEpro100

Linux configuration:

Standard 2.1.117 without any patches

Drivers for 2940UW, EEPro100,
Root file system on NFS with BOOTP IP autoconfiguration
MD driver,
Serial console

When booting 2.1.117 I now also get an OOPS (I overlooked it this morning,
because it had already been scrolled off the screen by the wait_on_bh
messages). After that I get the wait_on_bh messages, the system does not
respond to the network. 2.1.116 does not give me the OOPSes but just the
wait_on_bh stuff, 2.1.114 worked reasonably OK, I did not try 2.1.115.

I've included the OOPS and the wait_on_bh messages as well.

Cheers

Steffen

The OOPS (output of ksymoops):

Using `../System.map' to map addresses to symbols.

>>EIP: c0148d55 <nfs_flush_dirty_pages+181/1a4>
Trace: c01475f1 <nfs_file_close+31/58>
Trace: c0126aba <__fput+22/50>
Trace: c0126b37 <close_fp+4f/84>
Trace: c0126bcc <sys_close+60/7c>
Trace: c010afec <system_call+34/38>
Code: c0148d55 <nfs_flush_dirty_pages+181/1a4>
Code: c0148d55 <nfs_flush_dirty_pages+181/1a4> 8b 40 5c movl
0x5c(%eax),%eax
Code: c0148d58 <nfs_flush_dirty_pages+184/1a4> 8b 40 44 movl
0x44(%eax),%eax
Code: c0148d5b <nfs_flush_dirty_pages+187/1a4> 50 pushl
%eax
Code: c0148d5c <nfs_flush_dirty_pages+188/1a4> e8 4f 99 02 00 call
c01726b0 <rpc_clnt_sigunmask>
Code: c0148d61 <nfs_flush_dirty_pages+18d/1a4> 83 c4 08 addl
$0x8,%esp
Code: c0148d64 <nfs_flush_dirty_pages+190/1a4> e9 af fe ff ff jmp
fffffec3 <_EIP+fffffec3>

... and later I get the wait_on_bh messages repeating

...
wait_on_bh, CPU 0:
irq: 1 [0 1]
bh: 1 [0 1]
<[c0113d4b]> <[c017547e]> <[c01755ae]> <[c0174a39]> <[c0175c95]>
<[c0147c5c]> <[c0175824]> <[c0175919]>
wait_on_bh, CPU 0:
irq: 1 [0 1]
bh: 1 [0 1]
<[c0113d4b]> <[c017547e]> <[c01755ae]> <[c0174a39]> <[c0175c95]>
<[c0147c5c]> <[c0175824]> <[c0175919]>
...

The trace corresponds to

del_timer
__rpc_wake_up
rpc_wake_up_next
xprt_release
rpc_release_task
nfs_readpage_result
__rpc_execute
__rpc_schedule

(I looked this up manually, hopefully I got it right) ...

+-----------CERN------------------+-----------private-----------------+
! Steffen Luitz ! !
! CERN / EP Division ! 199, Rue d'Allemogne !
! CH-1211 Geneve 23 ! F-01710 Thoiry !
! Tel.: +41 22 767 9878 ! Tel. +33 450205829 !
! +41 79 201 3186 (mobile) ! !
! e-mail: Steffen.Luitz@cern.ch ! !
+---------------------------------+-----------------------------------+

------- end of forwarded message -------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html