Re: [RFC][PATCH 1/5] mm: Rework {set,clear,mm}_tlb_flush_pending()

From: Peter Zijlstra
Date: Fri Jul 28 2017 - 13:46:40 EST


On Fri, Jun 09, 2017 at 03:45:54PM +0100, Will Deacon wrote:
> On Wed, Jun 07, 2017 at 06:15:02PM +0200, Peter Zijlstra wrote:
> > Commit:
> >
> > af2c1401e6f9 ("mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates")
> >
> > added smp_mb__before_spinlock() to set_tlb_flush_pending(). I think we
> > can solve the same problem without this barrier.
> >
> > If instead we mandate that mm_tlb_flush_pending() is used while
> > holding the PTL we're guaranteed to observe prior
> > set_tlb_flush_pending() instances.
> >
> > For this to work we need to rework migrate_misplaced_transhuge_page()
> > a little and move the test up into do_huge_pmd_numa_page().
> >
> > Cc: Mel Gorman <mgorman@xxxxxxx>
> > Cc: Rik van Riel <riel@xxxxxxxxxx>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > ---
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -527,18 +527,16 @@ static inline cpumask_t *mm_cpumask(stru
> > */
> > static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> > {
> > - barrier();
> > + /*
> > + * Must be called with PTL held; such that our PTL acquire will have
> > + * observed the store from set_tlb_flush_pending().
> > + */
> > return mm->tlb_flush_pending;
> > }
> > static inline void set_tlb_flush_pending(struct mm_struct *mm)
> > {
> > mm->tlb_flush_pending = true;
> > -
> > - /*
> > - * Guarantee that the tlb_flush_pending store does not leak into the
> > - * critical section updating the page tables
> > - */
> > - smp_mb__before_spinlock();
> > + barrier();
>
> Why do you need the barrier() here? Isn't the ptl unlock sufficient?

So I was going through these here patches again, and wrote the
following comment:

static inline void set_tlb_flush_pending(struct mm_struct *mm)
{
mm->tlb_flush_pending = true;
/*
* The only time this value is relevant is when there are indeed pages
* to flush. And we'll only flush pages after changing them, which
* requires the PTL.
*
* So the ordering here is:
*
* mm->tlb_flush_pending = true;
* spin_lock(&ptl);
* ...
* set_pte_at();
* spin_unlock(&ptl);
*
*
* spin_lock(&ptl)
* mm_tlb_flush_pending();
* ....
* spin_unlock(&ptl);
*
* flush_tlb_range();
* mm->tlb_flush_pending = false;
*/
}

And while the ptl locks are indeed sufficient to constrain the true
assignment, what constrains the false assignment? As in the above there
is nothing stopping the false from ending up visible at
mm_tlb_flush_pending().

Or does flush_tlb_range() have implicit ordering? It does on x86, but is
this generally so?