Re: [RFC PATCH 0/2] improve vmalloc allocation

From: Uladzislau Rezki
Date: Mon Oct 22 2018 - 12:53:02 EST


On Mon, Oct 22, 2018 at 02:51:42PM +0200, Michal Hocko wrote:
> Hi,
> I haven't read through the implementation yet but I have say that I
> really love this cover letter. It is clear on intetion, it covers design
> from high level enough to start discussion and provides a very nice
> testing coverage. Nice work!
>
> I also think that we need a better performing vmalloc implementation
> long term because of the increasing number of kvmalloc users.
>
> I just have two mostly workflow specific comments.
>
> > A test-suite patch you can find here, it is based on 4.18 kernel.
> > ftp://vps418301.ovh.net/incoming/0001-mm-vmalloc-stress-test-suite-v4.18.patch
>
> Can you fit this stress test into the standard self test machinery?
>
If you mean "tools/testing/selftests", then i can fit that as a kernel module.
But not all the tests i can trigger from kernel module, because 3 of 8 tests
use __vmalloc_node_range() function that is not marked as EXPORT_SYMBOL.

> > It is fixed by second commit in this series. Please see more description in
> > the commit message of the patch.
>
> Bug fixes should go first and new functionality should be built on top.
>
Thanks for the good point.

> A kernel crash sounds serious enough to have a fix marked for stable. If
> the fix is too hard/complex then we might consider a revert of the
> faulty commit.
>
The fix is straightforward and easy. It adds a threshold passing which we
forbid cond_resched_lock() and continue draining of lazy pages.

> >
> > 3) This one is related to PCPU allocator(see pcpu_alloc_test()). In that
> > stress test case i see that SUnreclaim(/proc/meminfo) parameter gets increased,
> > i.e. there is a memory leek somewhere in percpu allocator. It sounds like
> > a memory that is allocated by pcpu_get_vm_areas() sometimes is not freed.
> > Resulting in memory leaking or "Kernel panic":
> >
> > ---[ end Kernel panic - not syncing: Out of memory and no killable processes...
>
> It would be great to pin point this one down before the rework as well.
>
Actually it has been fixed recently. Roman Gushchin pointed to the:

6685b357363b ("percpu: stop leaking bitmap metadata blocks")

i have checked, it works fine and fixes a leak i see.

Thank you!

--
Vlad Rezki

> Thanks a lot!
> --
> Michal Hocko
> SUSE Labs