Re: [PATCH] Fix deadlock in RPC scheduling code.

From: Trond Myklebust
Date: Fri Mar 10 2006 - 10:21:59 EST


On Fri, 2006-03-10 at 16:10 +0100, Aurelien Degremont wrote:
> Trond Myklebust wrote:
>
> > The real fix is the one I posted in response to this thread last week.
>
> Oops, I missed it.
>
> Ok for the patch, the list iteration will be better, but I don't
> understand how this will prevent the race condition?
>
> I do not think it is not a good idea to keep this lock order in
> rpc_wake_up_task() anyway. I must be missing something but I
> think this function should be modified in order to be in accordance with
> the lock hierarchy in rpc code. It seems to me that the potential race
> is still there.
>
> Even if we cannot certify task->u.tk_wait.rpc_waitq is valid, the
> current kernel code cannot either (err... I think it can't). So let's
> try at least to improve it, even if we cannot set it totally harmless.
> Warn me if I'm wrong :
> When rpc_wake_up_task() is called, the calling context is helpless.
> So we have absolutely no information on the task queue. We must
> atomically check the "queued-ness" of the task and grab the queue lock
> to prevent any error? Hmmm... So the matter is : the queue mustn't be
> modified between the test and the lock? Have we some "magical" lock
> somewhere which could help up? I didn't find it.

Yes. The RPC_TASK_QUEUED bit can only be cleared when both the
RPC_TASK_WAKEUP bit _and_ the queue spinlock are held.
If you are holding either one of those two, then it is safe to test for
RPC_IS_QUEUED(). If the latter is true, then it is also safe to
dereference the value of task->u.tk_wait.rpc_waitq.

Cheers,
Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/