Re: [PATCH] memcg: hugetlbfs basic usage accounting

From: Roman Gushchin
Date: Wed Nov 15 2017 - 06:19:04 EST


On Wed, Nov 15, 2017 at 09:35:04AM +0100, Michal Hocko wrote:
> On Tue 14-11-17 17:24:29, Roman Gushchin wrote:
> > This patch implements basic accounting of memory consumption
> > by hugetlbfs pages for cgroup v2 memory controller.
> >
> > Cgroup v2 memory controller lacks any visibility into the
> > hugetlbfs memory consumption. Cgroup v1 implemented a separate
> > hugetlbfs controller, which provided such stats, and also
> > provided some control abilities. Although porting of the
> > hugetlbfs controller to cgroup v2 is arguable a good idea and
> > is outside of scope of this patch, it's very useful to have
> > basic stats provided by memory.stat.

Hi, Michal!

> Separate hugetlb cgroup controller was really a deliberate decision.
> We didn't want to mix hugetlb with the reclaimable memory. There is no
> reasonable way to enforce memcg limits if hugetlb pages are involved.
>
> AFAICS your patch shouldn't break the hugetlb controller because that
> one (ab)uses page[2].private to store the hstate for the accounting.
> You also do not really charge those hugetlb pages so the memcg
> accounting will work unchaged.

Yes, you are right.

>
> So my primary question is, why don't you simply allow hugetlb controller
> rather than tweak stats for memcg? Is there any fundamental reason why
> hugetlb controller is not v2 compatible?

I really don't know if the hugetlb controller has enough users to deserve
full support in v2 interface: adding knobs like memory.hugetlb.current,
memory.hugetlb.min, memory.hugetlb.high, memory.hugetlb.max, etc.

I'd be rather conservative here and avoid adding a lot to the interface
without clear demand. Also, hugetlb pages are really special, and it's
at least not obvious how, say, memory.high should work for it.

At the same time we don't really have any accounting of hugetlb page
usage (except system-wide stats in sysfs). And providing such stats
is really useful.
In my particular case, I have some number of pre-allocated hugepages,
and I have several containerized workloads, which are potentially
using them to get performance bonuses. Having these stats allows to
attribute the memory holding by hugetlb pages to one of the workloads.

> It feels really strange to keeps stats of something the controller
> doesn't really control. I can imagine confused users claiming that
> numbers just do not add up...

This is why I do not add this number to memory.current. At the same
time numbers in memory.stat are not intended to be summed (we have
event counters there, dirty pages counter, etc), so I don't see a problem.

Thanks!