Re: [Results] [RFC PATCH v4 00/40] mm: Memory Power Management

From: Srivatsa S. Bhat
Date: Thu Sep 26 2013 - 14:37:44 EST

Next message: David Rientjes: "Re: [PATCH V1] oom: avoid selecting threads sharing mm with init"
Previous message: John Stultz: "[PATCH 4/4] [RFC] ipv6: Fix for possible ipv6 seqlock deadlock"
In reply to: Arjan van de Ven: "Re: [Results] [RFC PATCH v4 00/40] mm: Memory Power Management"
Next in thread: Srivatsa S. Bhat: "Re: [Results] [RFC PATCH v4 00/40] mm: Memory Power Management"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 09/26/2013 11:36 PM, Arjan van de Ven wrote:
>>>>>
>>>>
>>>> Arjan, are you referring to the fact that Intel/SNB systems can exploit
>>>> memory self-refresh only when the entire system goes idle? Is that why
>>>> this
>>>> patchset won't turn out to be that useful on those platforms?
>>>
>>> no we can use other things (CKE and co) all the time.
>>>
>>
>> Ah, ok..
>>
>>> just that we found that statistical grouping gave 95%+ of the benefit,
>>> without the cost of being aggressive on going to a 100.00% grouping
>>>
>>
>> And how do you do that statistical grouping? Don't you need patches
>> similar
>> to those in this patchset? Or are you saying that the existing vanilla
>> kernel itself does statistical grouping somehow?
>
> so the way I scanned your patchset.. half of it is about grouping,
> the other half (roughly) is about moving stuff.
>

Actually, either by number-of-lines or by patch count, a majority of the
patchset is about grouping, and only a few patches do the moving part.

As I mentioned in my earlier mail, patches 1-33 achieve the grouping,
whereas patches 34-40 do the movement. (Both sorted-buddy allocator and
the region allocators are grouping techniques.)

And v3 of this patchset actually didn't have the movement stuff at all,
it just had the grouping parts. And they got me upto around 120 free-regions
at the end of test run - a noticeably better consolidation ratio compared
to mainline (18).

http://article.gmane.org/gmane.linux.kernel.mm/106283

> the grouping makes total sense to me.

Ah, great!

> actively moving is the part that I am very worried about; that part
> burns power to do
> (and performance).... for which the ROI is somewhat unclear to me
> (but... data speaks. I can easily be convinced with data that proves one
> way or the other)
>

Actually I have added some intelligence in the moving parts to avoid being
too aggressive. For example, I don't do _any_ movement if more than 32 pages
in a region are used, since it will take a considerable amount of work to
evacuate that region. Further, my evacuation/compaction technique is very
conservative:
1. I reclaim only clean page-cache pages. So no disk I/O involved.
2. I move movable pages around.
3. I allocate target pages for migration using the fast buddy-allocator
itself, so there is not a lot of PFN scanning involved.

And that's it! No other case for page movement. And with this conservative
approach itself, I'm getting great consolidation ratios!
I am also thinking of adding more smartness in the code to be very choosy in
doing the movement, and do it only in cases where it is almost guaranteed to
be beneficial. For example, I can make the kmempowerd kthread more "lazy"
while moving/reclaiming stuff; I can bias the page movements such that "cold"
pages are left around (since they are not expected to be referenced much
anyway) and only the (few) hot pages are moved... etc.

And this aggressiveness can be exposed as a policy/knob to userspace as well,
so that the user can control its degree as he wishes.

> is moving stuff around the
> 95%-of-the-work-for-the-last-5%-of-the-theoretical-gain
> or is statistical grouping enough to get > 95% of the gain... without
> the cost of moving.
>

I certainly agree with you on the part that moving pages should really be
a last resort sort of thing, and do it only where it really pays off. So
we should definitely go with grouping first, and then see how much additional
benefit the moving stuff will bring along with the involved overhead (by
appropriate benchmarking).

But one of the goals of this patchset was to give a glimpse of all the
techniques/algorithms we can employ to consolidate memory references, and get
an idea of the extent to which such algorithms would be effective in getting
us excellent consolidation ratios.

And now that we have several techniques to choose from (and with varying
qualities and aggressiveness), we can start evaluating them more deeply and
choose the ones that give us the most benefits with least cost/overhead.

>
>>
>> Also, I didn't fully understand how NUMA policy will help in this case..
>> If you want to group memory allocations/references into fewer memory
>> regions
>> _within_ a node, will NUMA policy really help? For example, in this
>> patchset,
>> everything (all the allocation/reference shaping) is done _within_ the
>> NUMA boundary, assuming that the memory regions are subsets of a NUMA
>> node.
>>

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David Rientjes: "Re: [PATCH V1] oom: avoid selecting threads sharing mm with init"
Previous message: John Stultz: "[PATCH 4/4] [RFC] ipv6: Fix for possible ipv6 seqlock deadlock"
In reply to: Arjan van de Ven: "Re: [Results] [RFC PATCH v4 00/40] mm: Memory Power Management"
Next in thread: Srivatsa S. Bhat: "Re: [Results] [RFC PATCH v4 00/40] mm: Memory Power Management"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]