Re: [PATCH] cgroup: fix dentry still in use bug when dropping cssrefs after umount

From: Tejun Heo
Date: Tue Jul 03 2012 - 13:03:19 EST


Hello, Li.

On Sat, Jun 30, 2012 at 03:07:55PM +0800, Li Zefan wrote:
> If there's a css ref dangling after umount, when the ref goes down
> to 0, dput will drop the cgroup's dentry and then drop the root
> dentry. But kill_sb will be called inbetween, as dropping cgroup dentry
> unpinned sb, and now vfs will find the root dentry's refcnt is not 0.

Ah, okay, the window is between dput of child and parent. Now I can
reproduce. :) Dunno why your case didn't work here.

> @@ -78,8 +79,8 @@ struct cgroup_subsys_state {
> /* ID for this css, if possible */
> struct css_id __rcu *id;
>
> - /* Used to put @cgroup->dentry on the last css_put() */
> - struct work_struct dput_work;
> + /* Used to put @cgroup->css_refs on the last css_put() */
> + struct work_struct put_work;
> };
>
> /* bits in struct cgroup_subsys_state flags field */
> @@ -170,6 +171,9 @@ struct cgroup {
> */
> atomic_t count;
>
> + /* We won't destroy the cgroup when there are css refs */
> + struct kref css_refs;
> +
> /*
> * We link our 'sibling' struct into our parent's 'children'.
> * Our children link their 'sibling' into our 'children'.

Hmm... adding another layer of refcnting. Now we end up with three
layers of refcnting - the active usage via cgroup->count, cgroup
lifetime via cgroup->dentry->d_count and this extended lifecycle
refcnt. In addition, we now have to deal with partially destroyed
cgroups - e.g. running cgroup_path() will oops on lingering cgroups.
Hating this tangling of dentries and cgroups more and more. :(

So, the reason why the d_release() change caused this problem was
because it decoupled super deactivation from rmdir. Before, dentries
were holding onto super the same but as rmdir also holds onto super
via normal vfs access, the whole recurssion was protected. ie.

1. rmdir(2) - gets s_active

2. cgroup waits for css drain

3. dput() happens on cgroup synchronously. This ends up
calling deactivate_super() via d_iput() and then may
recursively put parent dentries but that's safe thanks to
s_active ref from rmdir(2).

4. rmdir(2) is done, super deactivated.

After super deactivation is moved to d_release(), the last dput may
happen outside rmdir(2) or any other vfs syscall, so between dput of
child and root, which doesn't hold s_active, super can try to die.

Ugh....... I don't know. I'll think more about it.

Thanks a lot.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/