Observing the scheduler to detect sources of system crashes

Tobias Haustein (haustein@informatik.rwth-aachen.de)
Thu, 7 Oct 1999 13:48:38 +0200


--UnaWdueM1EBWVRzC
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

Hi!

I've got a problem with my Linux box. Every now and then, the system
just hangs. Sometimes, this happens after 20 minutes, sometimes the
system crashes after an uptime of 10 days. I changed nearly every
piece of software and hardware, but the crashes continue. Meanwhile I
expect this to be a software problem. I'm currently using a SUSE 6.1
distribution and a 2.2.7 kernel. I've used different X servers with
different gfx cards.=20

If this is a software problem, there is one way to find the problem:
By keeping book about the currently running process. If the crash
occurs in the same process ever and ever again, this process should be=20
the bad guy (I treat the kernel as a process, too). The only cause of
crashes that is not detected by this method could be crashes that are
triggered by DMA transfers. These are unlikely if the hardware is
changed.=20

What I want to know is, whether there is already some bookkeeping
feature or a patch for this available. If not, I would implement
something like this:=20

The kernel writes the PID of the active process or the pseudo-PID 0=20
to a serial port that runs at some high speed. When a new process is
created, the name and parameters of this process are written to the
line, too. On my machine, there are 100 timer interrupts per second.=20
Other interrupts happen more seldom. Therefore I assume that there=20
are less than 200 PIDs/second to be written out. Since a PID is
four bytes, it fits in the FIFO and the data rate of about 800 bytes
per second should be no problem.

What do you think about this approach?=20

Ciao,

Tobias

--=20
Dipl. Inform. Tobias Haustein

Department of Computer Science IV, Aachen University of Technology
Ahornstr. 55, D-52056 Aachen
Phone +49 (241) 80-21417, Fax +49 (241) 8888-220
E-Mail haustein@informatik.rwth-aachen.de
Web http://www-i4.informatik.rwth-aachen.de/~haustein/

--UnaWdueM1EBWVRzC
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: sUdG6jhBq6fGnwYDkNkolE5Xg92nyevq

iQEVAwUBN/yIlRs02tO3FOYBAQETVwf/Rya5OmT1hRIVG50+TgUZVl6p5sfgxBja
D+qXnG9U9pk3jpu2fbt/ottQbg1ieE/994kncQHjAGiAUE7Wwt71Q1J8vzTCilSY
PUNNf2wkSxpONyCGurNoMHTtH4XIzjG5BSNUYI95nrVYineHNwRZEQ498n/vr1w5
MZif02rSu+OosAFHSkYtM0eEchafQdZptNcNv6lPK+cWBF3GO6uRp2b1gtR2JvyF
NRMc0M9nKTjpE0FlGpRjCSqARnAIKmArYDxXunLeEIbnh2SjlWkdQsrwtTezSoJz
iGpnNnNyUHlDciPbHvFoN12Q9TieMMH7rtpPT4PZSBMLAcqeHskaeg==
=1dFp
-----END PGP SIGNATURE-----

--UnaWdueM1EBWVRzC--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/