Re: [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time

From: Andy Lutomirski
Date: Fri Jun 22 2018 - 10:59:03 EST


On Wed, Jun 20, 2018 at 12:57 PM Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> Andy discovered that speculative memory accesses while in lazy
> TLB mode can crash a system, when a CPU tries to dereference a
> speculative access using memory contents that used to be valid
> page table memory, but have since been reused for something else
> and point into la-la land.
>
> The latter problem can be prevented in two ways. The first is to
> always send a TLB shootdown IPI to CPUs in lazy TLB mode, while
> the second one is to only send the TLB shootdown at page table
> freeing time.
>
> The second should result in fewer IPIs, since operationgs like
> mprotect and madvise are very common with some workloads, but
> do not involve page table freeing. Also, on munmap, batching
> of page table freeing covers much larger ranges of virtual
> memory than the batching of unmapped user pages.
>
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Tested-by: Song Liu <songliubraving@xxxxxx>
> ---
> arch/x86/include/asm/tlbflush.h | 5 +++++
> arch/x86/mm/tlb.c | 24 ++++++++++++++++++++++++
> include/asm-generic/tlb.h | 10 ++++++++++
> mm/memory.c | 22 ++++++++++++++--------
> 4 files changed, 53 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 6690cd3fc8b1..3aa3204b5dc0 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -554,4 +554,9 @@ extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
> native_flush_tlb_others(mask, info)
> #endif
>
> +extern void tlb_flush_remove_tables(struct mm_struct *mm);
> +extern void tlb_flush_remove_tables_local(void *arg);
> +
> +#define HAVE_TLB_FLUSH_REMOVE_TABLES
> +
> #endif /* _ASM_X86_TLBFLUSH_H */
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index e055d1a06699..61773b07ed54 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -646,6 +646,30 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> put_cpu();
> }
>
> +void tlb_flush_remove_tables_local(void *arg)
> +{
> + struct mm_struct *mm = arg;
> +
> + if (this_cpu_read(cpu_tlbstate.loaded_mm) == mm &&
> + this_cpu_read(cpu_tlbstate.is_lazy))
> + /*
> + * We're in lazy mode. We need to at least flush our
> + * paging-structure cache to avoid speculatively reading
> + * garbage into our TLB. Since switching to init_mm is barely
> + * slower than a minimal flush, just switch to init_mm.
> + */
> + switch_mm_irqs_off(NULL, &init_mm, NULL);

Can you add braces?

> +}
> +
> +void tlb_flush_remove_tables(struct mm_struct *mm)
> +{
> + int cpu = get_cpu();
> + /*
> + * XXX: this really only needs to be called for CPUs in lazy TLB mode.
> + */
> + if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
> + smp_call_function_many(mm_cpumask(mm), tlb_flush_remove_tables_local, (void *)mm, 1);

I suspect that most if the gain will come from fixing this limitation :)