Re: [PATCH v4 2/2] mm, thp: avoid unnecessary swapin in khugepaged

From: Rik van Riel
Date: Tue Mar 22 2016 - 15:21:30 EST


On Mon, 2016-03-21 at 16:36 +0100, Michal Hocko wrote:
> On Sun 20-03-16 20:07:39, Ebru Akagunduz wrote:
> >
> > Currently khugepaged makes swapin readahead to improve
> > THP collapse rate. This patch checks vm statistics
> > to avoid workload of swapin, if unnecessary. So that
> > when system under pressure, khugepaged won't consume
> > resources to swapin.
> OK, so you want to disable the optimization when under the memory
> pressure. That sounds like a good idea in general.
>Â
> > @@ -2493,7 +2494,14 @@ static void collapse_huge_page(struct
> > mm_struct *mm,
> > Â goto out;
> > Â }
> > Â
> > - __collapse_huge_page_swapin(mm, vma, address, pmd);
> > + swap = get_mm_counter(mm, MM_SWAPENTS);
> > + curr_allocstall = sum_vm_event(ALLOCSTALL);
> > + /*
> > + Â* When system under pressure, don't swapin readahead.
> > + Â* So that avoid unnecessary resource consuming.
> > + Â*/
> > + if (allocstall == curr_allocstall && swap != 0)
> > + __collapse_huge_page_swapin(mm, vma, address,
> > pmd);
> this criteria doesn't really make much sense to me. So we are
> checking
> whether there was the direct reclaim invoked since some point in time
> (more on that below) and we take that as a signal of a strong memory
> pressure, right? What if that was quite some time ago? What if we
> didn't
> have a single direct reclaim but the kswapd was busy the whole time.
> Or
> what if the allocstall was from a different numa node?

Do you have a measure in mind that the code should test
against, instead?

I don't think we want page cache turnover to prevent
khugepaged collapsing THPs, but if the system gets
to the point where kswapd is doing pageout IO, or
swapout IO, or kswapd cannot keep up, we should
probably slow down khugepaged.

If another NUMA node is under significant memory
pressure, we probably want the programs from that
node to be able to do some allocations from this
node, rather than have khugepaged consume the memory.

> > Â anon_vma_lock_write(vma->anon_vma);
> > Â
> > @@ -2905,6 +2913,7 @@ static int khugepaged(void *none)
> > Â set_user_nice(current, MAX_NICE);
> > Â
> > Â while (!kthread_should_stop()) {
> > + allocstall = sum_vm_event(ALLOCSTALL);
> > Â khugepaged_do_scan();
> And this sounds even buggy AFAIU. I guess you want to snapshot before
> goint to sleep no? Otherwise you are comparing allocstall diff from a
> very short time period. Or was this an intention and you really want
> to
> watch for events while khugepaged is running? If yes a comment would
> be
> due here.

You are right, the snapshot should be taken after
khugepaged_do_work().

The memory pressure needs to be measured over the
longest time possible between khugepaged runs.

> That being said, is this actually useful in the real life? Basing
> your
> decision on something as volatile as the direct reclaim would lead to
> rather volatile results. E.g. how stable are the numbers during your
> test?
>
> Wouldn't it be better to rather do an optimistic swapin and back out
> if the direct reclaim is really required. I realize this will be a
> much
> bigger change but it would make more sense I guess.

That depends on how costly swap IO is.

Having khugepaged be on the conservative side is probably
a good idea, given how many systems out there still have
hard drives today.

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part