Re: Possible dcache BUG

From: Gene Heskett
Date: Thu Aug 05 2004 - 21:07:07 EST


On Thursday 05 August 2004 12:26, Linus Torvalds wrote:
>On Thu, 5 Aug 2004, Denis Vlasenko wrote:
>> It is not a BUG().
>
>Oh yes it is. The 2.6.8-rc2-mm2 report definitely was a BUG().
>
>Earlier ones may not have been, but on the other hand, earlier ones
> may not have had the BUG()-check for corrupted list_del() usage -
> it's not in the standard kernel, and I don't know when it was added
> to -mm. (We used to have it a _long_ time ago, but then we removed
> it because there were no reports of problems).
>
>> It's an oops (dereferencing a d_op pointer with value
>> 0x00000900+14 IIRC, Gene has complete disassembly with location of
>> that event).
>
>.. and that must be because of some kind of pointer corruption,
> where the dentry was either free'd twice or the dentry simply isn't
> a dentry at all, it just got to be used as such because of some
> bug.
>
>> It is not reproducible on request, but happens for him from time
>> to time in the same place with the same bogus value of d_op.
>
>I've followed the discussion. You may not have noticed that the last
> one was different. (And I _think_ it may hav ebeen the first time
> Gene did a -mm kernel, so I do believe that the list_del()
> debugging was the thing that caught it).
>
>Anyway, one other thing that makes me worry is the fact that Gene
>apparently has a K7. One of the things AMD has gotten wrong several
> times is prefetching, and it so happens that the dcache code is one
> of the users of the prefetch instruction. prude_dcache() in
> particular.
>
>So I'm also entertaining the notion that there's an actual prefetch
> data corruption, not just the known AMD bug with occasional
> spurious page faults. Who else has seen the problem? What CPU's are
> involved?
>
> Linus

Two things that may be of interest: 1, I do run the -mm kernels too as
I figure the more they get excersized, the quicker some fault will be
found. This included the whole chain of 2.6.7's & 2.6.8-rc1/2-mm1/2.
In this case all hell broke loose here while everyone was at the
conventions. Not your fault, but I was "grabbing a life jacket"
here.

2. with the patch Linus sent, top is now showing 383 megs of free ram.
I suspect that without that patch, it might well be less than 15 megs
after a many (10+) hour uptime. So this patch is definitely more
aggressive in its memory housekeeping than without it.

And everything is on except the acpi stuffs, which has ever interested
me here. I do use apm, but only for shutdowns & rtc in utc.
PREEMPT, 4k stacks, page tables in high mem, frame pointers etc, its
all on right now. So far, I haven't even heard the first shoe
drop. :)

--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/