Re: Possible dcache BUG

From: Linus Torvalds
Date: Fri Aug 06 2004 - 12:33:04 EST




On Fri, 6 Aug 2004, Ingo Molnar wrote:
>
> last night i ran another overnight test: 2.6.8-rc3-vanilla with
> CONFIG_PREEMPT enabled and no other changes. I've also reduced the CPU's
> clock speed by 5% to reduce the chance of hw problems. The crash below
> triggered after roughly 12 hours of runtime. I've also attached the full
> disassembly of __d_lookup(). The crash happens in hlist_for_each():
>
> c01632f3: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
> c01632f9: 8d bc 27 00 00 00 00 lea 0x0(%edi,1),%edi
> c0163300: 8b 03 mov (%ebx),%eax <==== [*]
>
> the crashing instruction is preceeded by two prefetch instructions (the
> disassembly has the alternate-insn NOP).

That's not right.

The prefetchnta instruction is three or four bytes long (four if it uses
the ebp register that needs the "0(ebp)" modrm format).

We use a NOP4 for space in there, and the things you point to are a
NOP6+NOP7 pair.

Your two nop's are the ones gcc has inserted in order to start the loop at
a 16-byte boundary (ie c0163300 is the top of the loop). The nop that gets
replaced by a prefetch is the instruction _after_ the one that faulted for
you:

8b 03 mov (%ebx),%eax
8d 74 26 00 lea 0x0(%esi,1),%esi

I think.

> to me this crash seems to imply prefetch.

I don't think it's obvious yet. It's close to the prefetch, but it's the
instruction just before. Which in an OoO CPU doesn't necessarily mean
much, of course - or it could be that the prefetch caused some trouble
last time around the loop and we only see it now.

Or it could be totally prefetch-unrelated. I do find the prefetch thing
intriguing, though.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/