RE: [RFC] sunrpc: Fix race between work-queue and rpc_killall_tasks.

From: Myklebust, Trond
Date: Fri Jul 08 2011 - 18:15:02 EST


> -----Original Message-----
> From: Ben Greear [mailto:greearb@xxxxxxxxxxxxxxx]
> Sent: Friday, July 08, 2011 6:03 PM
> To: Myklebust, Trond
> Cc: linux-nfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [RFC] sunrpc: Fix race between work-queue and
> rpc_killall_tasks.
>
> On 07/08/2011 11:11 AM, Myklebust, Trond wrote:
> >> -----Original Message-----
> >> From: Ben Greear [mailto:greearb@xxxxxxxxxxxxxxx]
> >> Sent: Friday, July 08, 2011 1:19 PM
> >> To: Myklebust, Trond
> >> Cc: linux-nfs@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [RFC] sunrpc: Fix race between work-queue and
> >> rpc_killall_tasks.
> >>
> >> On 07/06/2011 04:45 PM, Trond Myklebust wrote:
> >>> On Wed, 2011-07-06 at 15:49 -0700, greearb@xxxxxxxxxxxxxxx wrote:
> >>>> From: Ben Greear<greearb@xxxxxxxxxxxxxxx>
> >>>>
> >>>> The rpc_killall_tasks logic is not locked against
> >>>> the work-queue thread, but it still directly modifies
> >>>> function pointers and data in the task objects.
> >>>>
> >>>> This patch changes the killall-tasks logic to set a flag
> >>>> that tells the work-queue thread to terminate the task
> >>>> instead of directly calling the terminate logic.
> >>>>
> >>>> Signed-off-by: Ben Greear<greearb@xxxxxxxxxxxxxxx>
> >>>> ---
> >>>>
> >>>> NOTE: This needs review, as I am still struggling to understand
> >>>> the rpc code, and it's quite possible this patch either doesn't
> >>>> fully fix the problem or actually causes other issues. That said,
> >>>> my nfs stress test seems to run a bit more stable with this patch
> >> applied.
> >>>
> >>> Yes, but I don't see why you are adding a new flag, nor do I see
> why
> >> we
> >>> want to keep checking for that flag in the rpc_execute() loop.
> >>> rpc_killall_tasks() is not a frequent operation that we want to
> >> optimise
> >>> for.
> >>>
> >>> How about the following instead?
> >>
> >> Ok, I looked at your patch closer. I think it can still cause
> >> bad race conditions.
> >>
> >> For instance:
> >>
> >> Assume that tk_callback is NULL at beginning of while loop in
> >> __rpc_execute,
> >> and tk_action is rpc_exit_task.
> >>
> >> While do_action(task) is being called, tk_action is set to NULL in
> >> rpc_exit_task.
> >>
> >> But, right after tk_action is set to NULL in rpc_exit_task, the
> >> rpc_killall_tasks
> >> method calls rpc_exit, which sets tk_action back to rpc_exit_task.
> >>
> >> I believe this could cause the xprt_release(task) logic to be called
> in
> >> the
> >> work-queue's execution of rpc_exit_task due to tk_action != NULL
> when
> >> it should not be.
> >
> > Why would this be a problem? xprt_release() can certainly be called
> multiple times on an rpc_task. Ditto rpbc_getport_done.
> >
> > The only thing that is not re-entrant there is rpcb_map_release,
> which should only ever be called once whether or not something calls
> rpc_killall_tasks.
>
>
> From the trace I posted, this stack trace below is being
> called with the void *data object already freed.
>
> One way for this to happen would be to have rpc_exit_task call task-
> >tk_ops->rpc_call_done
> more than once (I believe). Two calls to rpc_exit_task could do that,
> and since the
> rpc_exit_task method is assigned to tk_action, I *think* the race I
> mention above could cause
> rpc_exit_task to be called twice.
>
> [<ffffffff81105907>] print_trailer+0x131/0x13a
> [<ffffffff81105945>] object_err+0x35/0x3e
> [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7
> [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc]
> [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc]
> [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc]
> [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc]
> [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc]
> [<ffffffff81061343>] process_one_work+0x230/0x41d
> [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d
> [<ffffffff8106379f>] worker_thread+0x133/0x217
> [<ffffffff8106366c>] ? manage_workers+0x191/0x191
> [<ffffffff81066f9c>] kthread+0x7d/0x85
> [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10
> [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13
> [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56
> [<ffffffff81485ee0>] ? gs_change+0x13/0x13

The calldata gets freed in the rpc_final_put_task() which shouldn't ever be run while the task is still referenced in __rpc_execute

IOW: it should be impossible to call rpc_exit_task() after rpc_final_put_task
N‹§²æìr¸›yúèšØb²X¬¶ÇvØ^–)Þ{.nÇ+‰·¥Š{±‘êçzX§¶›¡Ü}©ž²ÆzÚ&j:+v‰¨¾«‘êçzZ+€Ê+zf£¢·hšˆ§~†­†Ûiÿûàz¹®w¥¢¸?™¨è­Ú&¢)ßf”ù^jÇy§m…á@A«a¶Úÿ 0¶ìh®å’i