Re: [PATCH] x86/mm: Change barriers before TLB flushes to smp_mb__after_atomic

From: Andy Lutomirski
Date: Thu Jun 09 2016 - 13:19:49 EST


On Fri, May 27, 2016 at 8:16 PM, Nadav Amit <namit@xxxxxxxxxx> wrote:
> When (current->active_mm != mm), flush_tlb_page() does not perform a
> memory barrier. In practice, this memory barrier is not needed since in
> the existing call-sites the PTE is modified using atomic-operations.
> This patch therefore modifies the existing smp_mb in flush_tlb_page to
> smp_mb__after_atomic and adds the missing one, while documenting the new
> assumption of flush_tlb_page.
>
> In addition smp_mb__after_atomic is also added to
> set_tlb_ubc_flush_pending, since it makes a similar implicit assumption
> and omits the memory barrier.
>
> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
> ---
> arch/x86/mm/tlb.c | 9 ++++++++-
> mm/rmap.c | 3 +++
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index fe9b9f7..2534333 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -242,6 +242,10 @@ out:
> preempt_enable();
> }
>
> +/*
> + * Calls to flush_tlb_page must be preceded by atomic PTE change or
> + * explicit memory-barrier.
> + */
> void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
> {
> struct mm_struct *mm = vma->vm_mm;
> @@ -259,8 +263,11 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
> leave_mm(smp_processor_id());
>
> /* Synchronize with switch_mm. */
> - smp_mb();
> + smp_mb__after_atomic();
> }
> + } else {
> + /* Synchronize with switch_mm. */
> + smp_mb__after_atomic();
> }
>
> if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 307b555..60ab0fe 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -613,6 +613,9 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
> {
> struct tlbflush_unmap_batch *tlb_ubc = &current->tlb_ubc;
>
> + /* Synchronize with switch_mm. */
> + smp_mb__after_atomic();
> +
> cpumask_or(&tlb_ubc->cpumask, &tlb_ubc->cpumask, mm_cpumask(mm));
> tlb_ubc->flush_required = true;
>
> --
> 2.7.4
>

This looks fine for x86, but I have no idea whether other
architectures are okay with it. akpm? mm folks?