FS corruption with 2.3.11

Bob_Tracy (rct@gherkin.sa.wlk.com)
Sat, 24 Jul 1999 11:22:00 -0500 (CDT)


Let me say up front that I acknowledge and understand the risks
associated with running development kernels. I'm not asking for
sympathy, nor do I want it :-(. I relate the following tale so
others can be made aware that a problem exists. It's beyond my
ability to fix.

2.3.11 had been up and running for just over 48 hours, during which I
noticed that the machine performance was slowly but surely declining.
New processes were taking longer to launch, and existing processes were
running *significantly* slower. In particular, "rc5des" normally
processes around 335 kkeys/sec on this machine, and performance had
dropped to around 107 kkeys/sec. The "top" command showed "update"
eating most of the available resources (followed by "rc5des"), whereas
in prior kernel releases, "update" doesn't even make it onto the list
that "top" displays. The normal "top" display on my machine shows
"rc5des", "top", and "X" in the top three positions most of the time.

Eventually, the machine locked up solid. The trigger event was when
I interrupted "rc5des" to take a look at its configuration. Had to
hit the reset switch after doing several things to verify that the
machine was really and truly dead (couldn't access via the network,
couldn't switch consoles, no disk activity for at least a minute, etc.).

The carnage that followed was beyond e2fsck's ability to handle without
manual intervention, which is the first time in many years of running
development kernels that that's *ever* happened. Most of the damage
was in the category of blocks assigned to multiple files, although
there were a few small files that went bye-bye entirely: those had been
created a few hours prior to the lockup.

The only good news is that everything seems to be back to normal, and
the few things that were lost entirely can be replaced. The files
containing duplicate blocks were typically logfiles, netscape cache
files, and/or others that are either unimportant or get recreated as
needed. For now, I'm back to running 2.3.6 which is the latest kernel
I would call well-behaved and stable.

-- 
Bob Tracy               |  "They couldn't hit an elephant at this dist- "
Firewall Security Corp. |   - Last words of Union General John Sedgwick,
rct@frus.com            |  Battle of Spotsylvania Court House, U.S. Civil War

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/