Re: cgroup: rmdir() does not complete

From: Mark Hills
Date: Fri Sep 10 2010 - 03:52:01 EST


On Fri, 10 Sep 2010, KAMEZAWA Hiroyuki wrote:

> On Fri, 10 Sep 2010 08:28:00 +0100 (BST)
> Mark Hills <mark@xxxxxxxxxxx> wrote:
>
> > On Fri, 10 Sep 2010, KAMEZAWA Hiroyuki wrote:
> >
> > > On Fri, 10 Sep 2010 00:04:31 +0100 (BST)
> > > Mark Hills <mark@xxxxxxxxxxx> wrote:
> > > > The report on the spinning process (23586) is dominated by calls from
> > > > mem_cgroup_force_empty.
> > > >
> > > > It seems to show lru_add_drain_all and drain_all_stock_sync are causing
> > > > the load (I assume drain_all_stock_sync has been optimised out). But I
> > > > don't think this is as important as what causes the spin.
> > > >
> > >
> > > I noticed you use FUSE and it seems there is a problem in FUSE v.s. memcg.
> > > I wrote a patch (onto 2.6.36 but can be applied..)
> > >
> > > Could you try this ? I'm sorry I don't use FUSE system and can't test
> > > right now.
> >
> > What makes you conclude that FUSE is in use? I do not think this is the
> > case. Or do you mean that it is a problem that the kernel is built with
> > FUSE support?
> >
> You wrote
> > The test case I was running is similar to the above. With the Lustre
> > filesystem the problem takes 4 hours or more to show itself. Recently I
> > ran 4 threads for over 24 hours without it being seen -- I suspect some
> > external factor is involved.
>
> I think Lustre FS is using FUSE. I'm wrong ?

Lustre does not use FUSE. But the client is a set of kernel modules, so
these could do anything.

> > I _can_ test the patch, but I still cannot reliably reproduce the problem
> > so it will be hard to conclude whether the patch works or not. Is there a
> > way to build a test case for this?
> >
>
> I'm sorry I'm not sure yet. But from your report, you have 6 pages of charge
> which cannot be found by force_empty(). And I found FUSE's pipe copy code
> inserts a page cache into radix-tree but not move them onto LRU.
>
> So,
> - There are remaining pages which is out-of-LRU
> - FUSE's "move" code does something curious, add_to_page_cache() but not LRU.
> - You reporeted you use Lustre FS.
>
> Then, I ask you. To test this, I have to study FUSE to write test module...
> Maybe adding printk() to where I added gfp_mask modification of fuse/dev.c
> can show something but...
>
> We may have something other problem, but it seems this is one of them.

Okay, it sounds like perhaps I need to investigate Lustre, I will do this
next week. But I think FUSE can be ruled out.

Thanks again

--
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/