Re: [RFC PATCH v2 11/12] x86/mm/tlb: Use async and inline messages for flushing

From: Andy Lutomirski
Date: Fri May 31 2019 - 17:18:46 EST


On Thu, May 30, 2019 at 11:37 PM Nadav Amit <namit@xxxxxxxxxx> wrote:
>
> When we flush userspace mappings, we can defer the TLB flushes, as long
> the following conditions are met:
>
> 1. No tables are freed, since otherwise speculative page walks might
> cause machine-checks.
>
> 2. No one would access userspace before flush takes place. Specifically,
> NMI handlers and kprobes would avoid accessing userspace.
>

I think I need to ask the big picture question. When someone calls
flush_tlb_mm_range() (or the other entry points), if no page tables
were freed, they want the guarantee that future accesses (initiated
observably after the flush returns) will not use paging entries that
were replaced by stores ordered before flush_tlb_mm_range(). We also
need the guarantee that any effects from any memory access using the
old paging entries will become globally visible before
flush_tlb_mm_range().

I'm wondering if receipt of an IPI is enough to guarantee any of this.
If CPU 1 sets a dirty bit and CPU 2 writes to the APIC to send an IPI
to CPU 1, at what point is CPU 2 guaranteed to be able to observe the
dirty bit? An interrupt entry today is fully serializing by the time
it finishes, but interrupt entries are epicly slow, and I don't know
if the APIC waits long enough. Heck, what if IRQs are off on the
remote CPU? There are a handful of places where we touch user memory
with IRQs off, and it's (sadly) possible for user code to turn off
IRQs with iopl().

I *think* that Intel has stated recently that SMT siblings are
guaranteed to stop speculating when you write to the APIC ICR to poke
them, but SMT is very special.

My general conclusion is that I think the code needs to document what
is guaranteed and why.

--Andy