Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

From: Nicholas Piggin
Date: Thu Dec 03 2020 - 17:14:32 EST


Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm:
> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
>
>> power: same as ARM, except that the loop may be rather larger since
>> the systems are bigger. But I imagine it's still faster than Nick's
>> approach -- a cmpxchg to a remote cacheline should still be faster than
>> an IPI shootdown.
>
> While a single atomic might be cheaper than an IPI, the comparison
> doesn't work out nicely. You do the xchg() on every unlazy, while the
> IPI would be once per process exit.
>
> So over the life of the process, it might do very many unlazies, adding
> up to a total cost far in excess of what the single IPI would've been.

Yeah this is the concern, I looked at things that add cost to the
idle switch code and it gets hard to justify the scalability improvement
when you slow these fundmaental things down even a bit.

I still think working on the assumption that IPIs = scary expensive
might not be correct. An IPI itself is, but you only issue them when
you've left a lazy mm on another CPU which just isn't that often.

Thanks,
Nick