Re: kernel cache mem bug(?)

From: kernel
Date: Fri Mar 17 2006 - 01:09:51 EST


> On Thu, 16 Mar 2006 18:41:02 +0100, kernel@xxxxxxxxxxx said:
> > [X.] Other notes, patches, fixes, workarounds:
>
> > Workaround: When we disable HyperThreading in BIOS, this
> > problem goes away. We re-enabling HT, it comes back...
>
> Have you ruled out marginal memory, or overclocking/overheating?

No overclocking done, no overheating happening. All RAM memory checks
turn out just fine, with and without HT.

> For that matter, this looks racy:
>
> while [ 0$MD5LOOPS -gt 0 ]; do
> md5sum cache.*-*-* >> md5log.$PID.lis
> MD5LOOPS=`expr 0$MD5LOOPS - 1`
> done &
>
> AMNT=`awk '$1!="91b82dcc83230890dbcdfc6b80571ddd"' md5log.$PID.lis | wc -l`
>
> If md5sum uses stdio to write the output, and it writes more than 4K or so,
> it's possible you can get a partial line right at the buffer boundary,
> which will then come up as a mismatch according to the awk. You might want
> to actually output the mismatched line in its entirety, and make sure you're
> looking at a complete line, and not a partial....

Well, the script itself was not how we noticied the problem. It was
created to see if the problem was reproducable. It was, during mentioned
conditions.

Regarding a possible race condition, the log file (md5log.*.lis) is left
for manual inspection if needed.

Following line will fix this (possible) event:
AMNT=`awk 'length($1)==32 && $1!="91b82dcc83230890dbcdfc6b80571ddd"' \
md5log.$PID.lis | wc -l`

//D

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/