[PATCH 0/5] mm: support parallel free of memory

From: Aaron Lu
Date: Fri Feb 24 2017 - 06:40:39 EST


For regular processes, the time taken in its exit() path to free its
used memory is not a problem. But there are heavy ones that consume
several Terabytes memory and the time taken to free its memory could
last more than ten minutes.

To optimize this use case, a parallel free method is proposed here.
For detailed explanation, please refer to patch 2/5.

I'm not sure if we need patch 4/5 which can avoid page accumulation
being interrupted in some case(patch description has more information).
My test case, which only deal with anon memory doesn't get any help out
of this of course. It can be safely dropped if it is deemed not useful.

A test program that did a single malloc() of 320G memory is used to see
how useful the proposed parallel free solution is, the time calculated
is for the free() call. Test machine is a Haswell EX which has
4nodes/72cores/144threads with 512G memory. All tests are done with THP
disabled.

kernel time
v4.10 10.8s Â2.8%
this patch(with default setting) 5.795s Â5.8%

Patch 3/5 introduced a dedicated workqueue for the free workers and
here are more results when setting different values for max_active of
this workqueue:

max_active: time
1 8.9s Â0.5%
2 5.65s Â5.5%
4 4.84s Â0.16%
8 4.77s Â0.97%
16 4.85s Â0.77%
32 6.21s Â0.46%

Comments are welcome.

Aaron Lu (5):
mm: add tlb_flush_mmu_free_batches
mm: parallel free pages
mm: use a dedicated workqueue for the free workers
mm: add force_free_pages in zap_pte_range
mm: add debugfs interface for parallel free tuning

include/asm-generic/tlb.h | 12 ++--
mm/memory.c | 138 +++++++++++++++++++++++++++++++++++++++-------
2 files changed, 122 insertions(+), 28 deletions(-)

--
2.9.3