Re: [PATCH v4 4/4] selftests: cgroup: add a selftest for memory.reclaim

From: Shakeel Butt
Date: Mon Apr 25 2022 - 11:16:22 EST


On Sat, Apr 23, 2022 at 02:43:13PM -0700, Yosry Ahmed wrote:
[...]
> > > + cg_run_nowait(memcg, alloc_pagecache_50M_noexit, (void *)(long)fd);
> > > + sleep(1);
> >
> > These sleep(1)s do not seem robust. Since kernel keeps the page cache
> > around, you can convert anon to use tmpfs and use simple cg_run to
> > trigger the allocations of anon (tmpfs) and file which will remain in
> > memory even after return from cg_run.
>
> Other tests in the file are also using sleep approach (see
> test_memcg_min, although it retries for multiple times until
> memory.current reaches an expected amount). In my experience it hasn't
> been flaky running for multiple times on different machines, but I
> agree it can be flaky (false negative).
>

If other tests are doing the same then ignore this comment for now.
There should be a separate effort to move towards more deterministic
approach for the tests instead of sleep().

> I am not sure about the allocating file pages with cg_run, is it
> guaranteed that the page cache will remain in memory until the test
> ends? If it doesn't, it can also flake, but it would produce false
> positives (the test could pass because the kernel drained page cache
> for some other reason although the interface is not working
> correctly).
>
> In my personal opinion, false negative flakes are better than false
> positives. At least currently the test explicitly and clearly fails if
> the allocations are not successful. If we rely on the page cache
> remaining until the test finishes then it could silently pass if the
> interface is not working correctly.
>
> There are a few ways we can go forward with this:
> 1) Keep everything as-is, but print a message if the test fails due to
> memory.current not reaching 100MB to make it clear that it didn't fail
> due to a problem with the interface.
> 2) Add a sleep/retry loop similar to test_memcg_min instead of sleeping once.
> 3) Send a signal from forked children when they are done with the
> allocation, and wait to receive this signal in the test to make sure
> the allocation is completed.
>
> In my opinion we should do (1) (and maybe (2)) for now as (3) could be
> an overkill if the test is normal passing. Maybe add a comment about
> (3) being an option in the future if the test flakes. Let me know what
> you think?

I am ok with (1).