I just received a report about this from a colleague here at CERN. There
seem to have an SMP problem with NFS root in the latest kernel
versions. The problem occurs when the NFS root filesystem is being
mounted.
I guess this is something that needs to be looked at before 2.2.
Any ideas?
Jes
------- start of forwarded message -------
From: Steffen Luitz <Steffen.Luitz@cern.ch>
To: Jes Sorensen <Jes.Sorensen@cern.ch>
Subject: More information ...
Date: Thu, 20 Aug 1998 15:49:36 +0200 (MET DST)
Hi Jes,
our hardware configuration:
Tyan TahoeII, 2 x PII-266, 192 MByte, 2 x Adaptec 2940UW, 1 x EEpro100
Linux configuration:
Standard 2.1.117 without any patches
Drivers for 2940UW, EEPro100,
Root file system on NFS with BOOTP IP autoconfiguration
MD driver,
Serial console
When booting 2.1.117 I now also get an OOPS (I overlooked it this morning,
because it had already been scrolled off the screen by the wait_on_bh
messages). After that I get the wait_on_bh messages, the system does not
respond to the network. 2.1.116 does not give me the OOPSes but just the
wait_on_bh stuff, 2.1.114 worked reasonably OK, I did not try 2.1.115.
I've included the OOPS and the wait_on_bh messages as well.
Cheers
Steffen
The OOPS (output of ksymoops):
Using `../System.map' to map addresses to symbols.
>>EIP: c0148d55 <nfs_flush_dirty_pages+181/1a4>
Trace: c01475f1 <nfs_file_close+31/58>
Trace: c0126aba <__fput+22/50>
Trace: c0126b37 <close_fp+4f/84>
Trace: c0126bcc <sys_close+60/7c>
Trace: c010afec <system_call+34/38>
Code: c0148d55 <nfs_flush_dirty_pages+181/1a4>
Code: c0148d55 <nfs_flush_dirty_pages+181/1a4> 8b 40 5c movl
0x5c(%eax),%eax
Code: c0148d58 <nfs_flush_dirty_pages+184/1a4> 8b 40 44 movl
0x44(%eax),%eax
Code: c0148d5b <nfs_flush_dirty_pages+187/1a4> 50 pushl
%eax
Code: c0148d5c <nfs_flush_dirty_pages+188/1a4> e8 4f 99 02 00 call
c01726b0 <rpc_clnt_sigunmask>
Code: c0148d61 <nfs_flush_dirty_pages+18d/1a4> 83 c4 08 addl
$0x8,%esp
Code: c0148d64 <nfs_flush_dirty_pages+190/1a4> e9 af fe ff ff jmp
fffffec3 <_EIP+fffffec3>
... and later I get the wait_on_bh messages repeating
...
wait_on_bh, CPU 0:
irq: 1 [0 1]
bh: 1 [0 1]
<[c0113d4b]> <[c017547e]> <[c01755ae]> <[c0174a39]> <[c0175c95]>
<[c0147c5c]> <[c0175824]> <[c0175919]>
wait_on_bh, CPU 0:
irq: 1 [0 1]
bh: 1 [0 1]
<[c0113d4b]> <[c017547e]> <[c01755ae]> <[c0174a39]> <[c0175c95]>
<[c0147c5c]> <[c0175824]> <[c0175919]>
...
The trace corresponds to
del_timer
__rpc_wake_up
rpc_wake_up_next
xprt_release
rpc_release_task
nfs_readpage_result
__rpc_execute
__rpc_schedule
(I looked this up manually, hopefully I got it right) ...
+-----------CERN------------------+-----------private-----------------+
! Steffen Luitz ! !
! CERN / EP Division ! 199, Rue d'Allemogne !
! CH-1211 Geneve 23 ! F-01710 Thoiry !
! Tel.: +41 22 767 9878 ! Tel. +33 450205829 !
! +41 79 201 3186 (mobile) ! !
! e-mail: Steffen.Luitz@cern.ch ! !
+---------------------------------+-----------------------------------+
------- end of forwarded message -------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html