Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

From: Alexander Gordeev
Date: Thu Dec 03 2020 - 13:35:25 EST


On Thu, Dec 03, 2020 at 09:14:22AM -0800, Andy Lutomirski wrote:
>
>
> > On Dec 3, 2020, at 9:09 AM, Alexander Gordeev <agordeev@xxxxxxxxxxxxx> wrote:
> >
> > On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
> >> other arch folk: there's some background here:
>
> >
> >>
> >> power: Ridiculously complicated, seems to vary by system and kernel config.
> >>
> >> So, Nick, your unconditional IPI scheme is apparently a big
> >> improvement for power, and it should be an improvement and have low
> >> cost for x86. On arm64 and s390x it will add more IPIs on process
> >> exit but reduce contention on context switching depending on how lazy
> >
> > s390 does not invalidate TLBs per-CPU explicitly - we have special
> > instructions for that. Those in turn initiate signalling to other
> > CPUs, completely transparent to OS.
>
> Just to make sure I understand: this means that you broadcast flushes to all CPUs, not just a subset?

Correct.
If mm has one CPU attached we flush TLB only for that CPU.
If mm has more than one CPUs attached we flush all CPUs' TLBs.

In fact, details are bit more complicated, since the hardware
is able to flush subsets of TLB entries depending on provided
parameters (e.g page tables used to create that entries).
But we can not select a CPU subset.

> > Apart from mm_count, I am struggling to realize how the suggested
> > scheme could change the the contention on s390 in connection with
> > TLB. Could you clarify a bit here, please?
>
> I’m just talking about mm_count. Maintaining mm_count is quite expensive on some workloads.
>
> >
> >> TLB works. I suppose we could try it for all architectures without
> >> any further optimizations. Or we could try one of the perhaps
> >> excessively clever improvements I linked above. arm64, s390x people,
> >> what do you think?
> >
> > I do not immediately see anything in the series that would harm
> > performance on s390.
> >
> > We however use mm_cpumask to distinguish between local and global TLB
> > flushes. With this series it looks like mm_cpumask is *required* to
> > be consistent with lazy users. And that is something quite diffucult
> > for us to adhere (at least in the foreseeable future).
>
> You don’t actually need to maintain mm_cpumask — we could scan all CPUs instead.
>
> >
> > But actually keeping track of lazy users in a cpumask is something
> > the generic code would rather do AFAICT.
>
> The problem is that arches don’t agree on what the contents of mm_cpumask should be. Tracking a mask of exactly what the arch wants in generic code is a nontrivial operation.

It could be yet another cpumask or the CPU scan you mentioned.
Just wanted to make sure there is no new requirement for an arch
to maintain mm_cpumask ;)

Thanks, Andy!