Re: Serious problem with ticket spinlocks on ia64

From: Tony Luck
Date: Thu Sep 02 2010 - 20:06:55 EST


Today's experiments were inspired by Petr's comment at the start of this thread:

"Interestingly, CPU 5 and CPU 7 are both granted the same ticket"

I added an "owner" element to every lock - I have 32 cpus, so I made
it "unsigned int". Then added to the lock and trylock paths code to
check that owner was 0 when the lock was granted, followed by:
lock->owner |= (1u << cpu); Then in the unlock path I check that just
the (1u << cpu) bit is set before doing: lock->owner &= ~(1u << cpu);

In my first test I got a hit. cpu28 had failed to get the lock and was
spinning holding ticket "1". When "now serving" hit 1, cpu28 saw that
the owner field was set to 0x1, indicating that cpu0 had also claimed
the lock. The lockword was 0x20002 at this point ... so cpu28 was
correct to believe that the lock had been freed and handed to it. It
was unclear why cpu0 had muscled in and set its bit in the owner
field. Also can't tell whether that was a newly allocated lock, or one
that had recently wrapped around.

Subsequent tests have failed to reproduce that result - system just
hangs without complaining about multiple cpus owning the same lock at
the same time - perhaps because of the extra tracing I included to
capture more details.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/