Re: [PATCH] vfs: Speed up deactivate_super for non-modular filesystems

From: Nick Piggin
Date: Wed May 09 2012 - 03:55:56 EST


On 8 May 2012 11:07, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
>
>> On Mon, May 07, 2012 at 11:17:06PM +0100, Al Viro wrote:
>>> On Mon, May 07, 2012 at 02:51:08PM -0700, Eric W. Biederman wrote:
>>>
>>> > /proc and similar non-modular filesystems do not need a rcu_barrier
>>> > in deactivate_locked_super. ÂBeing non-modular there is no danger
>>> > of the rcu callback running after the module is unloaded.
>>>
>>> There's more than just a module unload there, though - actual freeing
>>> Âstruct super_block also happens past that rcu_barrier()...
>
> Al. ÂI have not closely audited the entire code path but at a quick
> sample I see no evidence that anything depends on inode->i_sb being
> rcu safe. ÂDo you know of any such location?
>
> It has only been a year and a half since Nick added this code which
> isn't very much time to have grown strange dependencies like that.

No, it has always depended on this.

Look at ncp_compare_dentry(), for example.


>> Is there anything in there for which synchronous operation is required?
>> If not, one approach would be to drop the rcu_barrier() calls to a
>> workqueue or something similar.
>
> We need to drain all of the rcu callbacks before we free the slab
> and unload the module.
>
> This actually makes deactivate_locked_super the totally wrong place
> for the rcu_barrier. ÂWe want the rcu_barrier in the module exit
> routine where we destroy the inode cache.
>
> What I see as the real need is the filesystem modules need to do:
> Â Â Â Ârcu_barrier()
> Â Â Â Âkmem_cache_destroy(cache);
>
> Perhaps we can add some helpers to make it easy. ÂBut I think
> I would be happy today with simply moving the rcu_barrier into
> every filesystems module exit path, just before the file system
> module destoryed it's inode cache.

No, because that's not the only requirement for the rcu_barrier.

Making it asynchronous is not something I wanted to do, because
then we potentially have a process exiting from kernel space after
releasing last reference on a mount, but the mount does not go
away until "some time" later. Which is crazy.

However. We are holding vfsmount_lock for read at the point
where we ever actually do anything with an "rcu-referenced"
dentry/inode. I wonder if we could use this to get i_sb pinned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/