Re: [PATCH v9 17/24] mm: Protect mm_rb tree with a rwlock

From: Peter Zijlstra
Date: Wed Mar 14 2018 - 04:52:09 EST


On Tue, Mar 13, 2018 at 06:59:47PM +0100, Laurent Dufour wrote:
> This change is inspired by the Peter's proposal patch [1] which was
> protecting the VMA using SRCU. Unfortunately, SRCU is not scaling well in
> that particular case, and it is introducing major performance degradation
> due to excessive scheduling operations.

Do you happen to have a little more detail on that?

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 34fde7111e88..28c763ea1036 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -335,6 +335,7 @@ struct vm_area_struct {
> struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
> #ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> seqcount_t vm_sequence;
> + atomic_t vm_ref_count; /* see vma_get(), vma_put() */
> #endif
> } __randomize_layout;
>
> @@ -353,6 +354,9 @@ struct kioctx_table;
> struct mm_struct {
> struct vm_area_struct *mmap; /* list of VMAs */
> struct rb_root mm_rb;
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + rwlock_t mm_rb_lock;
> +#endif
> u32 vmacache_seqnum; /* per-thread vmacache */
> #ifdef CONFIG_MMU
> unsigned long (*get_unmapped_area) (struct file *filp,

When I tried this, it simply traded contention on mmap_sem for
contention on these two cachelines.

This was for the concurrent fault benchmark, where mmap_sem is only ever
acquired for reading (so no blocking ever happens) and the bottle-neck
was really pure cacheline access.

Only by using RCU can you avoid that thrashing.

Also note that if your database allocates the one giant mapping, it'll
be _one_ VMA and that vm_ref_count gets _very_ hot indeed.