RE: [PATCH v2 0/3] mm/damon: Profiling enhancements for DAMON

From: Prasad, Aravinda
Date: Tue Mar 19 2024 - 06:58:15 EST




> -----Original Message-----
> From: SeongJae Park <sj@xxxxxxxxxx>
> Sent: Tuesday, March 19, 2024 10:51 AM
> To: Prasad, Aravinda <aravinda.prasad@xxxxxxxxx>
> Cc: damon@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; sj@xxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; s2322819@xxxxxxxx; Kumar, Sandeep4
> <sandeep4.kumar@xxxxxxxxx>; Huang, Ying <ying.huang@xxxxxxxxx>;
> Hansen, Dave <dave.hansen@xxxxxxxxx>; Williams, Dan J
> <dan.j.williams@xxxxxxxxx>; Subramoney, Sreenivas
> <sreenivas.subramoney@xxxxxxxxx>; Kervinen, Antti
> <antti.kervinen@xxxxxxxxx>; Kanevskiy, Alexander
> <alexander.kanevskiy@xxxxxxxxx>
> Subject: Re: [PATCH v2 0/3] mm/damon: Profiling enhancements for DAMON
>
> Hi Aravinda,
>
>
> Thank you for posting this new revision!
>
> I remember I told you that I don't see a high level significant problems on on
> the reply to the previous revision of this patch[1], but I show a concern now.
> Sorry for not raising this earlier, but let me explain my humble concerns before
> being even more late.

Sure, no problem. We can discuss. I will get back to you with a detailed note.

Regards,
Aravinda

>
> On Mon, 18 Mar 2024 18:58:45 +0530 Aravinda Prasad
> <aravinda.prasad@xxxxxxxxx> wrote:
>
> > DAMON randomly samples one or more pages in every region and tracks
> > accesses to them using the ACCESSED bit in PTE (or PMD for 2MB pages).
> > When the region size is large (e.g., several GBs), which is common for
> > large footprint applications, detecting whether the region is accessed
> > or not completely depends on whether the pages that are actively
> > accessed in the region are picked during random sampling.
> > If such pages are not picked for sampling, DAMON fails to identify the
> > region as accessed. However, increasing the sampling rate or
> > increasing the number of regions increases CPU overheads of kdamond.
>
> DAMON uses sampling because it considers a region as accessed if a portion of
> the region that big enough to be detected via sampling is all accessed. If a
> region is having some pages that really accessed but the proportion is too
> small to be found via sampling, I think DAMON could say the overall access to
> the region is only modest and could even be ignored. In my humble opinion,
> this fits with the definition of DAMON region: A memory address range that
> constructed with pages having similar access frequency.


>
> >
> > This patch proposes profiling different levels of the
> > application\u2019s page table tree to detect whether a region is
> > accessed or not. This patch set is based on the observation that, when
> > the accessed bit for a page is set, the accessed bits at the higher
> > levels of the page table tree (PMD/PUD/PGD) corresponding to the path
> > of the page table walk are also set. Hence, it is efficient to check
> > the accessed bits at the higher levels of the page table tree to
> > detect whether a region is accessed or not. For example, if the access
> > bit for a PUD entry is set, then one or more pages in the 1GB PUD
> > subtree is accessed as each PUD entry covers 1GB mapping. Hence,
> > instead of sampling thousands of 4K/2M pages to detect accesses in a
> > large region, sampling at the higher level of page table tree is faster and
> efficient.
>
> Due to the above reason, I concern this could result in making DAMON
> monitoring results be inaccurately biased to report more than real accesses.
>
> >
> > This patch set is based on 6.8-rc5 kernel (commit: f48159f8,
> > mm-unstable
> > tree)
> >
> > Changes since v1 [1]
> > ====================
> >
> > - Added support for 5-level page table tree
> > - Split the patch to mm infrastructure changes and DAMON enhancements
> > - Code changes as per comments on v1
> > - Added kerneldoc comments
> >
> > [1] https://lkml.org/lkml/2023/12/15/272
> >
> > Evaluation:
> >
> > - MASIM benchmark with 1GB, 10GB, 100GB footprint with 10% hot data
> > and 5TB with 10GB hot data.
> > - DAMON: 5ms sampling, 200ms aggregation interval. Rest all
> > parameters set to default value.
> > - DAMON+PTP: Page table profiling applied to DAMON with the above
> > parameters.
> >
> > Profiling efficiency in detecting hot data:
> >
> > Footprint 1GB 10GB 100GB 5TB
> > ---------------------------------------------
> > DAMON >90% <50% ~0% 0%
> > DAMON+PTP >90% >90% >90% >90%
>
> Sampling interval is the time interval that assumed to be large enough for the
> workload to make meaningful amount of accesses within the interval. Hence,
> meaningful amount of sampling interval depends on the workload's
> characteristic and system's memory bandwidth.
>
> Here, the size of the hot memory region is about 100MB, 1GB, 10GB, and
> 10GB for the four cases, respectively. And you set the sampling interval as
> 5ms. Let's assume the system can access, say, 50 GB per second, and hence it
> could be able to access only up to 250 MB per 5ms. So, in case of 1GB and
> footprint, all hot memory region would be accessed while DAMON is waiting
> for next sampling interval. Hence, DAMON would be able to see most
> accesses via sampling. But for 100GB footprint case, only 250MB / 10GB =
> about 2.5% of the hot memory region would be accessed between the
> sampling interval. DAMON cannot see whole accesses, and hence the
> precision could be low.
>
> I don't know exact memory bandwith of the system, but to detect the 10 GB
> hot region with 5ms sampling interval, the system should be able to access
> 2GB memory per millisecond, or about 2TB memory per second. I think
> systems of such memory bandwidth is not that common.
>
> I show you also explored a configuration setting the aggregation interval
> higher. But because each sampling checks only access between the sampling
> interval, that might not help in this setup. I'm wondering if you also explored
> increasing sampling interval.
>
> Sorry again for finding this concern not early enough. But I think we may need
> to discuss about this first.
>
> [1] https://lkml.kernel.org/r/20231215201159.73845-1-sj@xxxxxxxxxx
>
>
> Thanks,
> SJ
>
>
> >
> > CPU overheads (in billion cycles) for kdamond:
> >
> > Footprint 1GB 10GB 100GB 5TB
> > ---------------------------------------------
> > DAMON 1.15 19.53 3.52 9.55
> > DAMON+PTP 0.83 3.20 1.27 2.55
> >
> > A detailed explanation and evaluation can be found in the arXiv paper:
> > https://arxiv.org/pdf/2311.10275.pdf
> >
> >
> > Aravinda Prasad (3):
> > mm/damon: mm infrastructure support
> > mm/damon: profiling enhancement
> > mm/damon: documentation updates
> >
> > Documentation/mm/damon/design.rst | 42 ++++++
> > arch/x86/include/asm/pgtable.h | 20 +++
> > arch/x86/mm/pgtable.c | 28 +++-
> > include/linux/mmu_notifier.h | 36 +++++
> > include/linux/pgtable.h | 79 ++++++++++
> > mm/damon/vaddr.c | 233 ++++++++++++++++++++++++++++--
> > 6 files changed, 424 insertions(+), 14 deletions(-)
> >
> > --
> > 2.21.3