Re: [PATCH] mm: sort freelist by rank number

From: Cho KyongHo
Date: Tue Aug 18 2020 - 05:02:33 EST


On Mon, Aug 10, 2020 at 09:32:18AM +0200, David Hildenbrand wrote:
> On 07.08.20 09:08, Pekka Enberg wrote:
> > Hi Cho and David,
> >
> > On Mon, Aug 3, 2020 at 10:57 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 03.08.20 08:10, pullip.cho@xxxxxxxxxxx wrote:
> >>> From: Cho KyongHo <pullip.cho@xxxxxxxxxxx>
> >>>
> >>> LPDDR5 introduces rank switch delay. If three successive DRAM accesses
> >>> happens and the first and the second ones access one rank and the last
> >>> access happens on the other rank, the latency of the last access will
> >>> be longer than the second one.
> >>> To address this panelty, we can sort the freelist so that a specific
> >>> rank is allocated prior to another rank. We expect the page allocator
> >>> can allocate the pages from the same rank successively with this
> >>> change. It will hopefully improves the proportion of the consecutive
> >>> memory accesses to the same rank.
> >>
> >> This certainly needs performance numbers to justify ... and I am sorry,
> >> "hopefully improves" is not a valid justification :)
> >>
> >> I can imagine that this works well initially, when there hasn't been a
> >> lot of memory fragmentation going on. But quickly after your system is
> >> under stress, I doubt this will be very useful. Proof me wrong. ;)
> >>
> >> ... I dislike this manual setting of "dram_rank_granule". Yet another mm
> >> feature that can only be enabled by a magic command line parameter where
> >> users have to guess the right values.
> >>
> >> (side note, there have been similar research approaches to improve
> >> energy consumption by switching off ranks when not needed).
> >
> > I was thinking of the exact same thing. PALLOC [1] comes to mind, but
> > perhaps there are more recent ones?
>
> A more recent one is "Footprint-Based DIMM Hotplug"
> (https://protect2.fireeye.com/v1/url?k=adc28c8b-f0128447-adc307c4-000babff3793-131bb23ec7a60bc9&q=1&e=4c1c9d3c-07c1-4d9a-bb4a-510a0304194a&u=https%3A%2F%2Fdl.acm.org%2Fdoi%2Fabs%2F10.1109%2FTC.2019.2945562), which triggers
> memory onlinng/offlining from the kernel to disable banks where possible
> (I don't think the approach is upstream material in that form).
>
> Also, I stumbled over "Towards Practical Page Placement for a Green
> Memory Manager" (https://ieeexplore.ieee.org/document/7397629),
> proposing an adaptive buddy allocator that tries to keep complete banks
> free in the buddy where possible. That approach sounded quite
> interesting while skimming over the paper.

The researches look like a linux support for partial array self-refresh.
Instead of choosing predefined memory array (bank, rank or segment) it
hot-removes in a channel(DIMM) granule.

Thank you for addressing the research. I need to look into the paper. I
also intersted in that area.

> >
> > I also dislike the manual knob, but is there a way for the OS to
> > detect this by itself? My (perhaps outdated) understanding was that
> > the DRAM address mapping scheme, for example, is not exposed to the
> > OS.
>
> I guess one universal approach is by measuring access times ... not what
> we might be looking for :)
>
> >
> > I think having more knowledge of DRAM controller details in the OS
> > would be potentially beneficial for better page allocation policy, so
> > maybe try come up with something more generic, even if the fallback to
> > providing this information is a kernel command line option.
> >

I don't find if there is a way to deliver detailed DRAM information
through ACPI, ATAG or something similar. But I didn't find.
As you mentiond above, if the kernel has knowledge of DRAM controllers,
it would be also beneficial for power management as well as page allocations.
PASR has not come to Linux due to the complexity, for example. If kernel
knows the granule of hot-add/remove for PASR, we can start discussions
how to support PASR in Linux.

Thank you for the opinions.

KyongHo