Re: [PATCH] mm/rmap: fix potential batched TLB flush race

From: Marco Elver
Date: Wed Nov 24 2021 - 03:11:18 EST

Next message: Horatiu Vultur: "Re: [PATCH net-next v2 2/6] net: lan966x: add the basic lan966x driver"
Previous message: Kalle Valo: "Re: [PATCH] iwlwifi: mvm: protect regulatory_set_wiphy_regd_sync() with wiphy lock"
Next in thread: Huang, Ying: "Re: [PATCH] mm/rmap: fix potential batched TLB flush race"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 24 Nov 2021 at 02:44, Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>
> Marco Elver <elver@xxxxxxxxxx> writes:
>
> > On Tue, 23 Nov 2021 at 08:44, Huang Ying <ying.huang@xxxxxxxxx> wrote:
[...]
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -633,7 +633,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, bool writable)
> >> * before the PTE is cleared.
> >> */
> >> barrier();
> >> - mm->tlb_flush_batched = true;
> >> + atomic_inc(&mm->tlb_flush_batched);
> >
> > The use of barrier() and atomic needs some clarification.
>
> There are some comments above barrier() to describe why it is needed.
> For atomic, because the type of mm->tlb_flush_batched is atomic_t, do we
> need extra clarification?

Apologies, maybe I wasn't clear enough: the existing comment tells me
the clearing of PTE should never happen after tlb_flush_batched is
set, but only the compiler is considered. However, I become suspicious
when I see barrier() paired with an atomic. barrier() is purely a
compiler-barrier and does not prevent the CPU from reordering things.
atomic_inc() does not return anything and is therefore unordered per
Documentation/atomic_t.txt.

> > Is there a
> > requirement that the CPU also doesn't reorder anything after this
> > atomic_inc() (which is unordered)? I.e. should this be
> > atomic_inc_return_release() and remove barrier()?
>
> We don't have an atomic_xx_acquire() to pair with this. So I guess we
> don't need atomic_inc_return_release()?

You have 2 things stronger than unordered: atomic_read() which result
is used in a conditional branch, thus creating a control-dependency
ordering later dependent writes; and the atomic_cmpxchg() is fully
ordered.

But before all that, I'd still want to understand what ordering
requirements you have. The current comments say only the compiler
needs taming, but does that mean we're fine with the CPU wildly
reordering things?

Thanks,
-- Marco

Next message: Horatiu Vultur: "Re: [PATCH net-next v2 2/6] net: lan966x: add the basic lan966x driver"
Previous message: Kalle Valo: "Re: [PATCH] iwlwifi: mvm: protect regulatory_set_wiphy_regd_sync() with wiphy lock"
Next in thread: Huang, Ying: "Re: [PATCH] mm/rmap: fix potential batched TLB flush race"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]