Re: [PATCH 1/2] Adds a read-only "procs" file similar to "tasks"that shows only unique tgids

From: KAMEZAWA Hiroyuki
Date: Fri Jul 03 2009 - 01:56:39 EST


On Thu, 2 Jul 2009 18:30:04 -0700
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 2 Jul 2009 18:08:29 -0700 Paul Menage <menage@xxxxxxxxxx> wrote:
>
> > On Thu, Jul 2, 2009 at 5:53 PM, Andrew Morton<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >> In the first snippet, count will be at most equal to length. As length
> > >> is determined from cgroup_task_count, it can be no greater than the
> > >> total number of pids on the system.
> > >
> > > Well that's a problem, because there can be tens or hundreds of
> > > thousands of pids, and there's a fairly low maximum size for kmalloc()s
> > > (include/linux/kmalloc_sizes.h).
> > >
> > > And even if this allocation attempt doesn't exceed KMALLOC_MAX_SIZE,
> > > large allocations are less unreliable. __There is a large break point at
> > > 8*PAGE_SIZE (PAGE_ALLOC_COSTLY_ORDER).
> >
> > This has been a long-standing problem with the tasks file, ever since
> > the cpusets days.
> >
> > There are ways around it - Lai Jiangshan <laijs@xxxxxxxxxxxxxx> posted
> > a patch that allocated an array of pages to store pids in, with a
> > custom sorting function that let you specify indirection rather than
> > assuming everything was in one contiguous array. This was technically
> > the right approach in terms of not needing vmalloc and never doing
> > large allocations, but it was very complex; an alternative that was
> > mooted was to use kmalloc for small cgroups and vmalloc for large
> > ones, so the vmalloc penalty wouldn't be paid generally. The thread
> > fizzled AFAICS.
>
> It's a problem which occurs fairly regularly. Some sites are fairly
> busted. Many gave up and used vmalloc(). Others use an open-coded
> array-of-pages thing.
>
> This happens enough that I expect the kernel would benefit from a
> general dynamic-array library facility. Something whose interface
> mimics the C-level array operations but which is internally implemented
> via some data structure which uses PAGE_SIZE allocations. Probably a
> simple two-level thing would suffice.
>
I think both of kmalloc usage here are very bad.

Why we can't do what readdir(/proc) does ? I'm sorry I misunderstand.
Following is an easy example.


0. at open, inilialize f_pos to 0. f_pos is used as "pid"
remember "css_set with hole" as template in f_private?(or somewhere) at open
...like this.
--
struct cgroupfs_root *root = cgrp->root;
struct cgroup *template = kzalloc(sizeof(void*) * CGROUP_SUBSYS_COUNT);

for (i = 0; i < CGROUP_SUBSYS_COUNT; i++)
if (root->subsys_bits & (1UL << i))
template[i] = cgrp->subsys[i];
--


1. at read(), find task_struct of "pid" in f_pos.
2. look up task_struct of "pid" and compare with f_private
--
struct cgroup *template = f_private;

for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
if (!template[i])
contiue;
if (template[i] != task_subsys_state(task, i))
break;
}
if (i == CGROUP_SUBSYS_COUNT)
print task;
--

4. f_pos++ until filling seq_buffer.


Thanks,
-Kame





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/