Re: [PATCH 2/2] orangefs: fix double-unlock issue in service_operation().

From: Al Viro
Date: Sat May 23 2020 - 01:41:57 EST


On Fri, May 22, 2020 at 11:35:51PM -0500, wu000273@xxxxxxx wrote:
> From: Qiushi Wu <wu000273@xxxxxxx>
>
> spin_unlock(&op->lock) is called before calling wake_up_interruptible().
> But spin_unlock() was called again after a call of the function
> "wait_for_matching_downcall" failed.

Yes, it was.

> Fix this issue by remove
> the second spin_unlock().

Why is that a bug? That's not an idle question - you could demonstrate
that if you had reproduced an unbalanced unlock experimentally, or you
could've proven it possible by analysis of the source.

The former ought to be clearly reported; the latter... AFAICS, your
reasoning is
1) at the time of wait_for_matching_downcall() call the spinlock
is not being held, since we'd unlocked it upstream of that call and had
done nothing that could have reacquired it.
2) after the return from that function we are doing unlock.
That is a bug, because one should not unlock a spinlock that is not
locked.

The gap in that proof is the unverified assumption that the locking
conditions upon return from wait_for_matching_downcall() are the same
as upon its call. IF that assumption holds, there is, indeed a bug.
Now, a look at the function in question shows
* a comment right before it claiming that it
" * Returns with op->lock taken.". Which might or might not be correct.
* one of the wait_for_completion...() called; that clearly
indicates that no spinlocks should be held upon the entry.
* unconditional spin_lock(&op->lock); right after that.
* several predicates checked, apparently some debugging
output possibly produced and a value returned. The predicates
(op_state_service(), op_state_purged()) are clearly locking-neutral -
grep shows
fs/orangefs/orangefs-kernel.h:154:#define op_state_serviced(op) ((op)->op_state & OP_VFS_STATE_SERVICED)
fs/orangefs/orangefs-kernel.h:155:#define op_state_purged(op) ((op)->op_state & OP_VFS_STATE_PURGED)
so it's plain arithmetics. The same, of course, applies to
comparisons.

In other words, the function *does* acquire that spinlock and
does not release it, regardless of the value it returns. Which
means that your patch would very likely to cause deadlocks.