Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree

From: Uladzislau Rezki
Date: Sat Jan 06 2024 - 11:36:36 EST


>
> On 2024/1/5 18:50, Uladzislau Rezki wrote:
>
> > Hello, Wen Gu.
> >
> > >
> > > Hi Uladzislau Rezki,
> > >
>
> <...>
>
> > > Fortunately, thank you for this patch set, the global vmap_area_lock was
> > > removed and per node lock vn->busy.lock is introduced. it is really helpful:
> > >
> > > In 48 CPUs qemu environment, the Requests/s increased by 5 times:
> > > - nginx
> > > - wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80
> > >
> > > vzalloced shmem vzalloced shmem(with this patch set)
> > > Requests/sec 113536.56 583729.93
> > >
> > >
> > Thank you for the confirmation that your workload is improved. The "nginx"
> > is 5 times better!
> >
>
> Yes, thank you very much for the improvement!
>
> > > But it also has some overhead, compared to using kzalloced shared memory
> > > or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area:
> > >
> > > kzalloced shmem vzalloced shmem(unset CONFIG_HARDENED_USERCOPY)
> > > Requests/sec 831950.39 805164.78
> > >
> > >
> > The CONFIG_HARDENED_USERCOPY prevents coping "wrong" memory regions. That is
> > why if it is a vmalloced memory it wants to make sure it is really true,
> > if not user-copy is aborted.
> >
> > So there is an extra work that involves finding a VA associated with an address.
> >
>
> Yes, and lock contention in finding VA is likely to be a performance bottleneck,
> which is mitigated a lot by your work.
>
> > > So, as a newbie in Linux-mm, I would like to ask for some suggestions:
> > >
> > > Is it possible to further eliminate the overhead caused by lock contention
> > > in find_vmap_area() in this scenario (maybe this is asking too much), or the
> > > only way out is not setting CONFIG_HARDENED_USERCOPY or not using vzalloced
> > > buffer in the situation where cocurrent kernel-userspace-copy happens?
> > >
> > Could you please try below patch, if it improves this series further?
> > Just in case:
> >
>
> Thank you! I tried the patch, and it seems that the wait for rwlock_t
> also exists, as much as using spinlock_t. (The flamegraph is attached.
> Not sure why the read_lock waits so long, given that there is no frequent
> write_lock competition)
>
> vzalloced shmem(spinlock_t) vzalloced shmem(rwlock_t)
> Requests/sec 583729.93 460007.44
>
> So I guess the overhead in finding vmap area is inevitable here and the
> original spin_lock is fine in this series.
>
I have also noticed a erformance difference between rwlock and spinlock.
So, yes. This is what we need to do extra if CONFIG_HARDENED_USERCOPY is
set, i.e. find a VA.

--
Uladzislau Rezki