Re: [PATCH 1/4] mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race

From: Nicholas Piggin
Date: Sun Aug 30 2020 - 21:25:49 EST


Excerpts from peterz@xxxxxxxxxxxxx's message of August 28, 2020 9:15 pm:
> On Fri, Aug 28, 2020 at 08:00:19PM +1000, Nicholas Piggin wrote:
>
>> Closing this race only requires interrupts to be disabled while ->mm
>> and ->active_mm are being switched, but the TLB problem requires also
>> holding interrupts off over activate_mm. Unfortunately not all archs
>> can do that yet, e.g., arm defers the switch if irqs are disabled and
>> expects finish_arch_post_lock_switch() to be called to complete the
>> flush; um takes a blocking lock in activate_mm().
>
> ARM at least has activate_mm() := switch_mm(), so it could be made to
> work.
>

Yeah, so long as that post_lock_switch switch did the right thing with
respect to its TLB flushing. It should do because arm doesn't seem to
check ->mm or ->active_mm (and if it was broken, the scheduler context
switch would be suspect too). I don't think the fix would be hard, just
that I don't have a good way to test it and qemu isn't great for testing
this kind of thing.

um too I think could probably defer that lock until after interrupts are
enabled again. I might throw a bunch of arch conversion patches over the
wall if this gets merged and try to move things along.

Thanks,
Nick