Re: [BUG] long freezes on thinkpad t60

From: Ingo Molnar
Date: Thu Jun 21 2007 - 16:31:01 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> > for (;;) {
> > for (i = 0; i < loops; i++) {
> > if (__raw_write_trylock(&lock->raw_lock))
> > return;
> > __delay(1);
> > }
>
> What a piece of crap.
>
> Anybody who ever waits for a lock by busy-looping over it is BUGGY,
> dammit!
>
> The only correct way to wait for a lock is:
>
> (a) try it *once* with an atomic r-m-w
> (b) loop over just _reading_ it (and something that implies a memory
> barrier, _not_ "__delay()". Use "cpu_relax()" or "smp_rmb()")
> (c) rinse and repeat.

damn, i first wrote up an explanation about why that ugly __delay(1) is
there (it almost hurts my eyes when i look at it!) but then deleted it
as superfluous :-/

really, it's not because i'm stupid (although i might still be stupid
for other resons ;-), it wasnt there in earlier spin-debug versions. We
even had an inner spin_is_locked() loop at a stage (and should add it
again).

the reason for the __delay(1) was really mundane: to be able to figure
out when to print a 'we locked up' message to the user. If it's 1
second, it causes false positive on some systems. If it's 10 minutes,
people press reset before we print out any useful data. It used to be
just a loop of rep_nop()s, but that was hard to calibrate: on certain
newer hardware it was triggering as fast as in 2 seconds, causing many
false positives. We cannot use jiffies nor any other clocksource in this
debug code.

so i settled for the butt-ugly but working __delay(1) thing, to be able
to time the debug messages.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/