Re: [PATCH 1/2] vfs: dcache: fix deadlock in tree traversal

From: Miklos Szeredi
Date: Tue Sep 18 2012 - 10:52:03 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Mon, Sep 17, 2012 at 1:39 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> Egads... The problem is real and analysis, AFAICS, is correct, but result
>> is extremely ugly ;-/
>
> Agreed.
>
> The problem (or at least one *part* of the problem) is that the "goto
> rename_retry" case is done for two different entities entirely:
>
> - the "try_to_ascend()" failure path, which can happen even when
> renamelock is held for writing.
>
> - the "if we weren't write-locked before, and the read-lock failed"
> case (which obviously cannot happen if we already held things for
> writing)
>
> That said, I'm not sure why/how that try_to_ascend() could even fail
> when we're holding things locked. I guess it's the DCACHE_DISCONNECTED
> case that triggers.

Yes, with the test cases that IBM were using it is DCACHE_DISCONNECTED
case that triggers the double-lock. Trond was misusing
DCACHE_DISCONNECTED and this made the failure in try_to_ascend() much
more likely (and bogus). But there is a case, which is triggered rarely
if ever, when try_to_ascend() failure with rename_lock held is perfectly
valid.

try_to_ascend() does an ellaborate locking dance that aims to ensure
that child remains valid after having ascended to parent. It nees a
valid child because it needs the next sibling to continue tree
traversal. If the child becomes invalid (final dput()) then the whole
tree traversal needs to be restarted since we don't know where we left
off. So we basically do

1. take the RCU lock
2. unlock child
3. lock parent
4. check whether child has been freed, if yes restart traversal
5. release RCU
6. find next sibling (child->d_child.next)
7. lock new child

It is step 4. that needs the RCU guarantees so we can check child's
dentry flag (DCACHE_DISCONNECTED flag presently). After that d_lock on
parent will protect "child" from going away (d_kill or d_move) and we
can get hold of the next sibling.

Thanks,
Miklos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/