Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomicoperations

From: Waiman Long
Date: Wed Apr 10 2013 - 17:26:21 EST


On 04/10/2013 01:16 PM, Ingo Molnar wrote:
* Waiman Long<Waiman.Long@xxxxxx> wrote:

On 04/10/2013 06:31 AM, Ingo Molnar wrote:
* Waiman Long<Waiman.Long@xxxxxx> wrote:

That said, the MUTEX_SHOULD_XCHG_COUNT macro should die. Why shouldn't all
architectures just consider negative counts to be locked? It doesn't matter
that some might only ever see -1.
I think so too. However, I don't have the machines to test out other
architectures. The MUTEX_SHOULD_XCHG_COUNT is just a safety measure to make sure
that my code won't screw up the kernel in other architectures. Once it is
confirmed that a negative count other than -1 is fine for all the other
architectures, the macro can certainly go.
I'd suggest to just remove it in an additional patch, Cc:-ing
linux-arch@xxxxxxxxxxxxxxxx The change is very likely to be fine, if not then it's
easy to revert it.

Thanks,

Ingo
Yes, I can do that. So can I put your name down as reviewer or ack'er for the
1st patch?
Since I'll typically the maintainer applying& pushing kernel/mutex.c changes to
Linus via the locking tree, the commit will get a Signed-off-by from me once you
resend the latest state of things - no need to add my Acked-by or Reviewed-by
right now.
Thank for the explanation. I am still pretty new to this process of upstream kernel development.

I'm still hoping for another patch from you that adds queueing to the spinners ...
That approach could offer better performance than current patches 1,2,3. In
theory.

I'd prefer that approach because you have a testcase that shows the problem and
you are willing to maximize performance with it - so we could make sure we have
reached maximum performance instead of dropping patches #2, #3, reaching partial
performance with patch #1, without having a real full resolution.

That is what I hope too. I am going to work on another patch to add spinner queuing to see how much performance impact it will have.

BTW, I have also been thinking about extracting the spinlock out from the mutex structure for some busy mutex by adding a pointer to an external auxiliary structure (separately allocated at init time). The idea is to use the external spinlock if available. Otherwise, the internal one will be used. That should reduce cacheline contention for some of the busiest mutex. The spinner queuing tickets can be in the external structure too. However, it requires a one line change in each of the mutex initialization code. I haven't actually made the code change and try it yet, but that is something that I am thinking of doing when I have time.

Thanks,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/