Re: kfaultd report

Michael L. Galbraith (mikeg@mikeg.weiden.de)
Tue, 4 Feb 1997 22:14:04 +0100 (MET)


On Sun, 2 Feb 1997, Andrew E. Mileski wrote:

> Well I finally managed to get Mingo's kfaultd patch working - it applies
> cleanly to v2.1.20, but doesn't compile without some changes.
>
> kfaultd triggered after a vast amount of disk activity (ie. it was
> a lot harder to cause a kernel stack overflow than usual IMHO).
>
> Unfortunately, it _really_ fried my system, and the only message
> I was able to recover was:
>
> [double fault detected,
> error code:00000000, ESP:C2005FD8 backlink:0 EIP:C01A73EA]
>
> EIP: aic7xxx_isr + 0x6
>
> Hope this is helpful.
>
> --
> Andrew E. Mileski mailto:aem@netcom.ca
> Linux Plug-and-Play Kernel Project http://www.redhat.com/linux-info/pnp/
> XFree86 Matrox Team http://www.bf.rmit.edu.au/~ajv/xf86-matrox.html
>
This is strange, I put Ingo's patch into my system the day he released it.
I never had a problem with it. Unfortunately, it didn't do me any good,
simply because I can't seem to trigger stack corruption except in 2.0.2x
with the pentium memcpy patch installed. Even then, only when I fiddle
with vfat filesystems. (known race) There is no way that I can see that
kfaultd can possibly trash a system... all it does is trigger an oops.
Maybe you were saying that the stack corruption ate your system.. it
may of course if you try to continue running after corruption exists.
You only have one legitimate shot at a problem like this... either it's
nailed down or start over.

If you know of a way to trigger a stack overflow, I'd be interested. The
purpose of kfaultd is exactly to nail this problem down. Notice that
noone has submitted an oops where scsi disk activity is at max? Alan
specifically called for input on this and has received none. If you know
of a way to reproduce this problem, let me know. I have a drive that I can
afford to trash... I mean _destroy_ (it's seriously wounded anyway).

-Mike

What were you doing at the time of stack overflow/corruption?