Re: [RFC] sunrpc: Fix race between work-queue and rpc_killall_tasks.

From: Ben Greear
Date: Fri Jul 08 2011 - 13:18:38 EST


On 07/06/2011 04:45 PM, Trond Myklebust wrote:
On Wed, 2011-07-06 at 15:49 -0700, greearb@xxxxxxxxxxxxxxx wrote:
From: Ben Greear<greearb@xxxxxxxxxxxxxxx>

The rpc_killall_tasks logic is not locked against
the work-queue thread, but it still directly modifies
function pointers and data in the task objects.

This patch changes the killall-tasks logic to set a flag
that tells the work-queue thread to terminate the task
instead of directly calling the terminate logic.

Signed-off-by: Ben Greear<greearb@xxxxxxxxxxxxxxx>
---

NOTE: This needs review, as I am still struggling to understand
the rpc code, and it's quite possible this patch either doesn't
fully fix the problem or actually causes other issues. That said,
my nfs stress test seems to run a bit more stable with this patch applied.

Yes, but I don't see why you are adding a new flag, nor do I see why we
want to keep checking for that flag in the rpc_execute() loop.
rpc_killall_tasks() is not a frequent operation that we want to optimise
for.

How about the following instead?

Ok, I looked at your patch closer. I think it can still cause
bad race conditions.

For instance:

Assume that tk_callback is NULL at beginning of while loop in __rpc_execute,
and tk_action is rpc_exit_task.

While do_action(task) is being called, tk_action is set to NULL in rpc_exit_task.

But, right after tk_action is set to NULL in rpc_exit_task, the rpc_killall_tasks
method calls rpc_exit, which sets tk_action back to rpc_exit_task.

I believe this could cause the xprt_release(task) logic to be called in the
work-queue's execution of rpc_exit_task due to tk_action != NULL when
it should not be.

I have no hard evidence this exact scenario is happening in my case, but I
believe the code is still racy with your patch.

For that matter, is it safe to modify the flags in rpc_killall_tasks:

rovr->tk_flags |= RPC_TASK_KILLED;

Is that guaranteed to be atomic with any other modification of flags?

Thanks,
Ben


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/