Re: [PATCH] mm: gup: fix the fast GUP race against THP collapse

From: David Hildenbrand
Date: Tue Sep 06 2022 - 08:51:01 EST


OK, I believe you're referring to this:

folio = try_grab_folio(page, 1, flags);

just earlier in gup_pte_range(). Yes that's true...but it's hidden, which
is unfortunate. Maybe a comment could help.


Most certainly.


If we still intend to change that code, we should fixup all GUP-fast
functions in a similar way. But again, I don't think we need a change here.


It's really rough, having to play this hide-and-seek game of "who did
the memory barrier". And I'm tempted to suggest adding READ_ONCE() to
any and all reads of the page table entries, just to help stay out of
trouble. It's a visual reminder that page table reads are always a
lockless read and are inherently volatile.

Of course, I realize that adding extra READ_ONCE() calls is not a good
thing. It might be a performance hit, although, again, these are
volatile reads by nature, so you probably had a membar anyway.

And looking in reverse, there are actually a number of places here where
we could probably get away with *removing* READ_ONCE()!

Overall, I would be inclined to load up on READ_ONCE() calls, yes. But I
sort of expect to be overridden on that, due to potential performance
concerns, and that's reasonable.

At a minimum we should add a few short comments about what memory
barriers are used, and why we don't need a READ_ONCE() or something
stronger when reading the pte.

Adding more unnecessary memory barriers doesn't necessarily improve the situation.

Messing with memory barriers is and remains absolutely disgusting.

IMHO, only clear documentation and ASCII art can keep it somehow maintainable for human beings.




- * After this gup_fast can't run anymore. This also removes
- * any huge TLB entry from the CPU so we won't allow
- * huge and small TLB entries for the same virtual address
- * to avoid the risk of CPU bugs in that area.
+ * This removes any huge TLB entry from the CPU so we won't allow
+ * huge and small TLB entries for the same virtual address to
+ * avoid the risk of CPU bugs in that area.
+ *
+ * Parallel fast GUP is fine since fast GUP will back off when
+ * it detects PMD is changed.
*/
_pmd = pmdp_collapse_flush(vma, address, pmd);

To follow up on David Hildenbrand's note about this in the nearby thread...
I'm also not sure if pmdp_collapse_flush() implies a memory barrier on
all arches. It definitely does do an atomic op with a return value on x86,
but that's just one arch.


I think a ptep/pmdp clear + TLB flush really has to imply a memory
barrier, otherwise TLB flushing code might easily mess up with
surrounding code. But we should better double-check.

Let's document the function as such, once it's verified: "This is a
guaranteed memory barrier".

Yes. Hopefully it indeed is on all architectures :)

--
Thanks,

David / dhildenb