Re: [PATCH 17/46] fs: Use rename lock and RCU for multi-step operations

From: Nick Piggin
Date: Tue Jan 18 2011 - 17:42:41 EST


On Wed, Jan 19, 2011 at 9:32 AM, Yehuda Sadeh Weinraub
<yehudasa@xxxxxxxxx> wrote:
> On Sat, Nov 27, 2010 at 1:44 AM, Nick Piggin <npiggin@xxxxxxxxx> wrote:
>> The remaining usages for dcache_lock is to allow atomic, multi-step read-side
>> operations over the directory tree by excluding modifications to the tree.
>> Also, to walk in the leaf->root direction in the tree where we don't have
>> a natural d_lock ordering.
>>
>> This could be accomplished by taking every d_lock, but this would mean a
>> huge number of locks and actually gets very tricky.
>>
>> Solve this instead by using the rename seqlock for multi-step read-side
>> operations, retry in case of a rename so we don't walk up the wrong parent.
>> Concurrent dentry insertions are not serialised against.  Concurrent deletes
>> are tricky when walking up the directory: our parent might have been deleted
>> when dropping locks so also need to check and retry for that.
>>
>> We can also use the rename lock in cases where livelock is a worry (and it
>> is introduced in subsequent patch).
>>
>> Signed-off-by: Nick Piggin <npiggin@xxxxxxxxx>
> ..
>> @@ -237,6 +238,7 @@ static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
>>        __releases(dcache_inode_lock)
>>        __releases(dcache_lock)
>>  {
>> +       dentry->d_parent = NULL;
>>        list_del(&dentry->d_u.d_child);
>>        if (parent)
>>                spin_unlock(&parent->d_lock);
>
> There's an issue with ceph as it references the
> dentry->d_parent(->d_inode) at dentry_release(), so setting
> dentry->d_parent to NULL here doesn't work with ceph. Though there is
> some workaround for it, we would like to be sure that this one is
> really required so that we don't exacerbate the ugliness. The
> workaround is to keep a pointer to the parent inode in the private
> dentry structure, which will be referenced only at the .release()
> callback. This is clearly not ideal.

Hmm, I'll have to think about it. Probably we can check for
d_count == 0 rather than parent != NULL I think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/