Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync

From: Vikram Mulukutla
Date: Fri Jul 28 2017 - 15:11:43 EST


On 2017-07-28 02:28, Peter Zijlstra wrote:
On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:

I think we should have this discussion now - I brought this up earlier [1]
and I promised a test case that I completely forgot about - but here it
is (attached). Essentially a Big CPU in an acquire-check-release loop
will have an unfair advantage over a little CPU concurrently attempting
to acquire the same lock, in spite of the ticket implementation. If the Big
CPU needs the little CPU to make forward progress : livelock.

This needs to be fixed in hardware. There really isn't anything the
software can sanely do about it.

It also doesn't have anything to do with the spinlock implementation.
Ticket or not, its a fundamental problem of LL/SC. Any situation where
we use atomics for fwd progress guarantees this can happen.


Agreed, it seems like trying to build a fair SW protocol over unfair HW.
But if we can minimally change such loop constructs to address this (all
instances I've seen so far use cpu_relax) it would save a lot of hours
spent debugging these problems. Lot of b.L devices out there :-)

It's also possible that such a workaround may help contention performance
since the big CPU may have to wait for say a tick before breaking out of
that loop (the non-livelock scenario where the entire loop isn't in a
critical section).

The little core (or really any core) should hold on to the locked
cacheline for a while and not insta relinquish it. Giving it a chance to
reach the SC.

Thanks,
Vikram

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project