On Wed, 2011-07-06 at 15:49 -0700, greearb@xxxxxxxxxxxxxxx wrote:From: Ben Greear<greearb@xxxxxxxxxxxxxxx>
The rpc_killall_tasks logic is not locked against
the work-queue thread, but it still directly modifies
function pointers and data in the task objects.
This patch changes the killall-tasks logic to set a flag
that tells the work-queue thread to terminate the task
instead of directly calling the terminate logic.
Signed-off-by: Ben Greear<greearb@xxxxxxxxxxxxxxx>
---
NOTE: This needs review, as I am still struggling to understand
the rpc code, and it's quite possible this patch either doesn't
fully fix the problem or actually causes other issues. That said,
my nfs stress test seems to run a bit more stable with this patch applied.
Yes, but I don't see why you are adding a new flag, nor do I see why we
want to keep checking for that flag in the rpc_execute() loop.
rpc_killall_tasks() is not a frequent operation that we want to optimise
for.
How about the following instead?
8<----------------------------------------------------------------------------------
From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001
From: Trond Myklebust<Trond.Myklebust@xxxxxxxxxx>
Date: Wed, 6 Jul 2011 19:44:52 -0400
Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks
Since rpc_killall_tasks may modify the rpc_task's tk_action field
without any locking, we need to be careful when dereferencing it.
+ do_action = task->tk_callback;
+ task->tk_callback = NULL;
+ if (do_action == NULL) {
/*
* Perform the next FSM step.
- * tk_action may be NULL when the task has been killed
- * by someone else.
+ * tk_action may be NULL if the task has been killed.
+ * In particular, note that rpc_killall_tasks may
+ * do this at any time, so beware when dereferencing.
*/
- if (task->tk_action == NULL)
+ do_action = task->tk_action;
+ if (do_action == NULL)
break;
- task->tk_action(task);
}
+ do_action(task);
/*
* Lockless check for whether task is sleeping or not.