Again on the same problem with my SMP machine; last time (when I got
the message in the subject line) I was running with no swap activated (I
have 1Gb of memory so that works quite well).
However I've planned to be able to add swap space as more people may need to
access the machine, so wondering if the 'no-swap' configuration may be the
problem, I activated the swap partitions (eight 128Mb partitions, giving one
more Gb of virtual memory...) ; the I got an "Oops 0000" after the usual 36h
of work (and while heavily compiling a source code base on an NFS mounted
file system, placing result on a local disk).
I've attached the Oops file, both as I typed it from the screen and after
running through ksymoops. I use an rpm-packaged kernel, that's why I don't
have the System.map I presume.
I've also attached the relevant portion of the messages log files (from a
bit before the crash to a bit after end of reboot).
The configuration is as follows:
Intel C440GX+ board with no additional I/O board; configuration is:
2 PentiumIII Xeon 550MHz CPUs
1Gbyte of RAM
1 AIC7896 Adaptec PCI-SCSI interface supporting 2 SCSI buses
Video controller is only used as text-mode system console
1 Intel 82559 Fast Ethernet controller
2 9Gb SCSI disks are connected to the Ultra2 LVD controller
Floppy, ATAPI CDROM drive
No IDE hard disks
Full duplex 100MHz Ethernet connected to a switch port
Video, Serial mouse and Keyboard connected through a switch
Linux RedHat-6.1, kernel upgraded to version 2.2.14-8smp (as
provided by Ed Schlunder on
Maybe this may indicate what was happenning? or is it related directly to
the previous message? (seems possible: first I was not finding space for NFS
requests, then I crashed while swapping and the trace talks of network
While I was typing this, I saw the load average on the SMP box
climbing steadily; there is the following messages on the system console: (X
= hex digit)
NFS: can't silly-delete src/.nfsXXXXXXXXXXXXXXXXXXX, error -512
fh_verify: i386-unknown-mingw32/bin permission failure, acc=3,
nfs server canopus not responding, still trying
then the dreaded
nfs: task 37637 can't get a request slot
Note that the network works OK, I can ping from canopus (the Solaris-2.6
server) to altair (the Linux SMP box) with less than 1ms round trip times:
100Mhz full duplex link through a switch :-); these two machines even
exchange successfully timed time stamps and canopus was able to access
altair files through NFS, while deneb sirius and vega (other UNIX boxes, one
under Solaris, the other AIX or hpUX) can access canopus NFS-exported
At the time of the crash the only big NFS activity on canopus was a tape
generation on sirius (an RS6000/AIX box) that was reading the files from
canopus (the Solaris box) and went on without any problem; I'm quite sure
this is not the problem as the same crash also occurs at times where there
were no big NFS activity on canopus .
I join the tcpdump file (generated on the Linux box, I don't find tcpdump on
my Solaris-2.6 box...) showing what happens on the network while altair is
blocked (fileterd to show only packets exchanged with canopus or sirius; if
you need the unfiltered trace I can send it).
I don't know if these two problems are related but I'm pretty sure they
appear after I installed the 2nd processor (I was already using an SMP
kernel with one processor and it ran for approxilately 1 month without any
problem); I was then using the kernel-2.2.12-32 delivered on the RedHat-6.1
official CD, then switched to the 2.2.14-8 kernel listed above to see if
that solved my problem.
Thanks in advance for any help.
PS: I'm not subscribed to the list, so please CC. me of any answer; Thanks.
97 bis, rue de Colombes
Tel: +33 (0) 1 47 68 80 80
Fax: +33 (0) 1 47 88 97 85
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:24 EST