Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm

From: Andy Lutomirski
Date: Fri Jun 01 2018 - 17:22:21 EST


On Fri, Jun 1, 2018 at 1:35 PM Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> On Fri, 2018-06-01 at 13:03 -0700, Andy Lutomirski wrote:
> > Mike, you never did say: do you have PCID on your CPU? Also, what is
> > your workload doing to cause so many switches back and forth between
> > init_mm and a task.
> >
> > The point of the optimization is that switching to init_mm() should
> > be
> > fairly fast on a PCID system, whereas an IPI to do the deferred flush
> > is very expensive regardless of PCID.
>
> While I am sure that bit is true, Song and I
> observed about 4x as much CPU use in the atomic
> operations in cpumask_clear_cpu and cpumask_set_cpu
> (inside switch_mm_irqs_off) as we saw CPU used
> in the %cr3 reload itself.
>
> Given how expensive those cpumask updates are,
> lazy TLB mode might always be worth it, especially
> on larger systems.
>

Hmm. I wonder if there's a more clever data structure than a bitmap
that we could be using here. Each CPU only ever needs to be in one
mm's cpumask, and each cpu only ever changes its own state in the
bitmask. And writes are much less common than reads for most
workloads.