Re: [PATCH -V6 07/14] memcg: Add HugeTLB extension

From: David Rientjes
Date: Thu May 24 2012 - 17:52:27 EST


On Mon, 16 Apr 2012, Aneesh Kumar K.V wrote:

> This patch implements a memcg extension that allows us to control HugeTLB
> allocations via memory controller. The extension allows to limit the
> HugeTLB usage per control group and enforces the controller limit during
> page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit
> at page fault time implies that, the application will get SIGBUS signal if it
> tries to access HugeTLB pages beyond its limit. This requires the application
> to know beforehand how much HugeTLB pages it would require for its use.
>
> The charge/uncharge calls will be added to HugeTLB code in later patch.
> Support for memcg removal will be added in later patches.
>

Again, I disagree with this approach because it's adding the functionality
to memcg when it's unnecessary; it would be a complete legitimate usecase
to want to limit the number of globally available hugepages to a set of
tasks without incurring the per-page tracking from memcg.

This can be implemented as a seperate cgroup and as we move to a single
hierarchy, you lose no functionality if you mount both cgroups from what
is done here.

It would be much cleaner in terms of

- build: not requiring ifdefs and dependencies on CONFIG_HUGETLB_PAGE,
which is a prerequisite for this functionality and is not for
CONFIG_CGROUP_MEM_RES_CTLR,

- code: seperating hugetlb bits out from memcg bits to avoid growing
mm/memcontrol.c beyond its current 5650 lines, and

- performance: not incurring any overhead of enabling memcg for per-
page tracking that is unnecessary if users only want to limit hugetlb
pages.

Kmem accounting and swap accounting is really a seperate topic and makes
sense to be incorporated directly into memcg because their usage is a
single number, the same is not true for hugetlb pages where charging one
1GB page is not the same as charging 512 2M pages. And we have no
usecases for wanting to track kmem or swap only without user page
tracking, what would be the point?

There's a reason we don't enable CONFIG_CGROUP_MEM_RES_CTLR in the
defconfig, we don't want the extra 1% metadata overhead of enabling it and
the potential performance regression from doing per-page tracking if we
only want to limit a global resource (hugetlb pages) to a set of tasks.

So please consider seperating this functionality out into its own cgroup,
there's no reason not to do it and it would benefit hugetlb users who
don't want to incur the disadvantages of enabling memcg entirely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/