Re: + per-cpuset-hugetlb-accounting-and-administration.patch addedto -mm tree

From: Andrew Morton
Date: Fri May 04 2007 - 01:04:06 EST


On Thu, 3 May 2007 21:49:12 -0700 "Ken Chen" <kenchen@xxxxxxxxxx> wrote:

> On 5/3/07, Paul Jackson <pj@xxxxxxx> wrote:
> > Adding Christoph Lameter <clameter@xxxxxxx> to the cc list, as he knows
> > more about hugetlb pages than I do.
> >
> > This patch strikes me as a bit odd.
> >
> > Granted, it's solving what could be a touchy problem with a fairly
> > simple solution, which is usually a Good Thing(tm).
> >
> > However, the idea that different tasks would see different values for
> > the following fields in /proc/meminfo:
> >
> > HugePages_Total: 0
> > HugePages_Free: 0
> >
> > strikes me as odd, and risky. I would have thought that usually, all
> > tasks in the system should see the same values in the files in /proc
> > (as opposed to the files in particular task subdirectories /proc/<pid>.)
> >
> > This patch strikes me as a bit of a hack, good for compatibility, but
> > hiding a booby trap that will bite some user code in the long run.
> >
> > But I'm not enough of an expert to know what the right tradeoffs are
> > in this matter.
>
> Would annotating the Hugepages_* field with name of cpuset help?

There are existing programs which parse /proc/meminfo. If we're going to
do any of this then it would need to be via new fields.

I don't think we should be altering the meaning of the HugePages fields
like this. One can imagine scenarios in which such a change would cause
existing userspace scripts to fail. Plus it's Just Weird to use
/proc/meminfo in this manner.

> I
> orginally thought that since cpuset's mems are hirearchical in memory
> assignment, it is fairly straightforward to understand what's going
> on: parent cpuset stats include its and all of its children. For
> example, if root cpuset has two sub job1 and job2 cpusets, each has 20
> and 30 htlb pages, when query at each level, we have:
>
> [root@box]# echo $$ > /dev/cpuset/tasks
> [root@box]# grep HugePages_Total /proc/meminfo
> HugePages_Total: 50
>
> [root@box]# echo $$ > /dev/cpuset/job1/tasks
> [root@box]# grep HugePages_Total /proc/meminfo
> HugePages_Total: 20
>
> [root@box]# echo $$ > /dev/cpuset/job2/tasks
> [root@box]# grep HugePages_Total /proc/meminfo
> HugePages_Total: 30
>
> If this is odd, do you have any suggestions for alternative?

If it's per-cpuset information then shouldn't it be presented in
/dev/cpuset/something?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/