Re: [PATCH v3 5/5] KVM: selftests: access_tracking_perf_test: Use MGLRU for access tracking

From: Sean Christopherson
Date: Wed Apr 30 2025 - 17:43:03 EST


On Tue, Apr 29, 2025, James Houghton wrote:
> On Mon, Apr 28, 2025 at 9:19 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > Using MGLRU on my home box fails.  It's full cgroup v2, and has both
> > CONFIG_IDLE_PAGE_TRACKING=y and MGLRU enabled.
> >
> > ==== Test Assertion Failure ====
> >   access_tracking_perf_test.c:244: false
> >   pid=114670 tid=114670 errno=17 - File exists
> >      1  0x00000000004032a9: find_generation at access_tracking_perf_test.c:244
> >      2  0x00000000004032da: lru_gen_mark_memory_idle at access_tracking_perf_test.c:272
> >      3  0x00000000004034e4: mark_memory_idle at access_tracking_perf_test.c:391
> >      4   (inlined by) run_test at access_tracking_perf_test.c:431
> >      5  0x0000000000403d84: for_each_guest_mode at guest_modes.c:96
> >      6  0x0000000000402c61: run_test_for_each_guest_mode at access_tracking_perf_test.c:492
> >      7  0x000000000041d8e2: cg_run at cgroup_util.c:382
> >      8  0x00000000004027fa: main at access_tracking_perf_test.c:572
> >      9  0x00007fa1cb629d8f: ?? ??:0
> >     10  0x00007fa1cb629e3f: ?? ??:0
> >     11  0x00000000004029d4: _start at ??:?
> >   Could not find a generation with 90% of guest memory (235929 pages).
> >
> > Interestingly, if I force the test to use /sys/kernel/mm/page_idle/bitmap, it
> > passes.
> >
> > Please try to reproduce the failure (assuming you haven't already tested that
> > exact combination of cgroups v2, MGLRU=y, and CONFIG_IDLE_PAGE_TRACKING=y). I
> > don't have bandwidth to dig any further at this time.
>
> Sorry... please see the bottom of this message for a diff that should fix this.
> It fixes these bugs:
>
> 1. Tracking generation numbers without hardware Accessed bit management.
> (This is addition of lru_gen_last_gen.)
> 1.5 It does an initial aging pass so that pages always move to newer
> generations in (or before) the subsequent aging passes. This probably
> isn't needed given the change I made for (1).
> 2. Fixes the expected number of pages for guest page sizes > PAGE_SIZE.
> (This is the move of test_pages. test_pages has also been renamed to avoid
> shadowing.)
> 3. Fixes an off-by-one error when looking for the generation with the most
> pages. Previously it failed to check the youngest generation, which I think
> is the bug you ran into. (This is the change to lru_gen_util.c.)

Ya, this was the bug I initially ran into, I also encountered more failues after
applying just that fix. But, with the full diff applied, it's passing, so good
to go for the next version from my end.