Possible bug in 2.0.0 (And later?)

Solitude (solitude@johnf.reshall.ksu.edu)
Sat, 1 Feb 1997 01:11:20 -0600 (CST)


I am writing partially to report a possible bug, and partially because I
am still in shock over what became of my ever-efficient OS...

I am running linux kernel 2.0.0 on an Acer VI15G PCI motherboard, with an
Intel Pent-100 chip. I have 40 megs of physical ram and (had) 32 megs of
swap. This system was first built in July of '96 and asside from various
reboots it has been running non-stop since that time.

I entered the room and the first thing I heard was massive HD activity.
The console had error messages streaming by, effictivley making it
unusable. I telnet'ed in and learned that syslog was reporting the
following error over and over:

Jan 31 15:52:38 johnf kernel: p_duplicate: trying to duplicate unused page

I looked through the source and it turns out this message is supposed to
read 'swap_duplicate' Anyway, back to the story... I immediatley turned
off the swap space and re-formatted and activated some new swap space on a
different drive I had. The problem seemed to be fixed. I re-booted onto
the rescue bootdisk and ran fsck and badblocks on all my filesystems. I
ran badblocks in write mode on all my swap partitions. Neither fsck or
badblocks reported any problems. I re-ran mkswap -c on all my swap
partitions and re-booted. That was about 7-8 hours ago, and the system
hasn't had any problems ever since.

Now: I'm no kernel hacker, but here are my own ameteur observations:
There is no physical problem with the ram, motherboard, disk, controller,
etc. (The controller is an AHA2940 and the ram is kingston) If this were
a hardware problem then it would have surfaced much earlier then now.
(Espically if it is a ram problem, as I regullarly pound the hell out of
the ram with various compiles) I also have built a facsimilie of this
system for several other people with the same components and no one else
has had any other problems.

After reading through the memory source-code I belive that the problem I
had was due to a corrupted map table in the swap area. I noticed the
following syslog entry that occured just before I rebooted the system:

Jan 31 15:59:26 johnf kernel: 000be600)

The only thing about the system that I ever thought was acting kind of
weird was that it seems there is about a 5 second delay betweem the
SIGTERM and SIGKILL signals during a shutdown. I have tried using the -t
parameter on shutdown, but no matter what the delay is very short.
Otherwise, the system has always been solid as a rock, aside from this one
incident.

Well, if anyone has any ideas on this, I would like to be able to have my
previous confidence about the stability of Linux. If this is a problem
that has been identified and corrected then I will upgrade my kernel
immediatley.

- John
<solitude@johnf.reshall.ksu.edu>
<jsf8471@ksu.edu>

_/_/_/_/_/ _/_/_/ _/ _/ _/ _/
_/ _/ _/ _/ _/ _/_/ _/
_/ _/ _/ _/_/_/_/ _/ _/ _/
_/ _/ _/ _/ _/ _/ _/ _/_/
_/_/ _/_/_/ _/ _/ _/ _/

NO SOLICITING!
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.3ia

mQCNAzJxGjQAAAEEALY2PAb11+HHumeMg4df/TL1uefyzBdNpA2baVuVyTRBQF/F
zpOJ5ZoOPjmysnMKR1JVN7t8fjdNXg/qRSMwtov8qucgUogaAfTzkBhj4YHdK737
u0glZ3oRpdFFuXgmAvXTldvdNiRBgH0X2WgQI26kpCIAMH3gq771h9lR9fNBAAUR
tD1Kb2huIEZyZWFyIDxzb2xpdHVkZUBqb2huZi5yZXNoYWxsLmtzdS5lZHU+IDxq
c2Y4NDcxQGtzdS5lZHU+iQCVAwUQMnEaNL71h9lR9fNBAQH5VwQAlby25sl4b9sy
enDu3d4uFj4poh3olePx3zzGaKh1pI5O5fSvosN48sx7q3c6sxK8IBkTKHju9DTj
6ev0d8f0R4REe5MHaoKGvezbK9/1E7T7+kTWsOSOsona7ps/Iii1Qw5naWAgBPCq
HfABpvnMe7MSqbndJnfO4hk5D+LVKlQ=
=3LXH
-----END PGP PUBLIC KEY BLOCK-----