Re: [RFC PATCH 0/6] hugetlbfs: Add cgroup resource controller for hugetlbfs

From: Aneesh Kumar K.V
Date: Fri Feb 17 2012 - 03:04:12 EST

Next message: Vasiliy Kulikov: "Re: [kernel-hardening] Re: Add overflow protection to kref"
Previous message: Paolo Bonzini: "[PATCH v2] block: avoid false positive warnings on ioctl to partition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Kamezawa,

Sorry for the late response as I was out of office for last few days.

On Tue, 14 Feb 2012 15:58:43 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Sat, 11 Feb 2012 03:06:40 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote:
>
> > Hi,
> >
> > This patchset implements a cgroup resource controller for HugeTLB pages.
> > It is similar to the existing hugetlb quota support in that the limit is
> > enforced at mmap(2) time and not at fault time. HugeTLB quota limit the
> > number of huge pages that can be allocated per superblock.
> >
> > For shared mapping we track the region mapped by a task along with the
> > hugetlb cgroup in inode region list. We keep the hugetlb cgroup charged
> > even after the task that did mmap(2) exit. The uncharge happens during
> > file truncate. For Private mapping we charge and uncharge from the current
> > task cgroup.
> >
>
> Hm, Could you provide an Documenation/cgroup/hugetlb.txt at RFC ?
> It makes clear what your patch does.

Will do in the next iteration.

>
> I wonder whether this should be under memory cgroup or not. In the 1st design
> of cgroup, I think it was considered one-feature-one-subsystem was good.
>
> But in recent discussion, I tend to hear that's hard to use.
> Now, memory cgroup has
>
> memory.xxxxx for controlling anon/file
> memory.memsw.xxxx for controlling memory+swap
> memory.kmem.tcp_xxxx for tcp controlling.
>
> How about memory.hugetlb.xxxxx ?
>

That is how i did one of the earlier version of the patch. But there are
few difference with the way we want to control hugetlb allocation. With
hugetlb cgroup, we actually want to enable application to fall back to
using normal pagesize if we are crossing cgroup limit. ie, we need to
enforce the limit during mmap. memcg tracks cgroup details along
with pages, hence implementing above gets challenging. Another
difference is we keep the cgroup charged even if the task exit as long as
the file is present in hugetlbfs. ie, if an application did mmap with
MAP_SHARED in hugetlbfs, the file size will be extended to the requested
length arg in mmap. This file will consume pages from hugetlb resource
until it is truncated. We want to track that resource usage as a part
of hugetlb cgroup.

>From the interface point of view what we have in hugetlb cgroup is
similar to what is in memcg. We end up with files like the below

hugetlb.16GB.limit_in_bytes
hugetlb.16GB.max_usage_in_bytes
hugetlb.16GB.usage_in_bytes
hugetlb.16MB.limit_in_bytes
hugetlb.16MB.max_usage_in_bytes
hugetlb.16MB.usage_in_bytes

>
> > The current patchset doesn't support cgroup hierarchy. We also don't
> > allow task migration across cgroup.
>
> What happens when a user destroys a cgroup which contains alive hugetlb pages ?
>
> Thanks,
> -Kame
>

Thanks
-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Vasiliy Kulikov: "Re: [kernel-hardening] Re: Add overflow protection to kref"
Previous message: Paolo Bonzini: "[PATCH v2] block: avoid false positive warnings on ioctl to partition"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]