Re: [PATCH 3/4] kthreads: rework kthread_stop()

From: Eric W. Biederman
Date: Mon Feb 02 2009 - 12:57:37 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> On 01/31, Rusty Russell wrote:
>>
>> On Friday 30 January 2009 23:20:58 Oleg Nesterov wrote:
>> > On 01/30, Oleg Nesterov wrote:
>> > >
>> > > With this patch kthread() allocates all neccesary data (struct kthread)
>> > > on its own stack, globals kthread_stop_xxx are deleted. ->vfork_done
>> > > is used as a pointer into "struct kthread", this means kthread_stop()
>> > > can easily wait for kthread's exit.
>>
>> > struct kthread {
>> > int should_stop;
>> > struct completion exited;
>> > };
>>
>> Mildly prefer bool in new code.
>
> OK, and
>
>> > #define to_kthread(tsk) \
>> > container_of((tsk)->vfork_done, struct kthread, exited)
>>
>> This needs a comment. Especially since to_xxx(yyy) is usually simply a
>> container_of(yyy, xxx, member). This one is special.
>
> OK, I'll send the cleanup patch.
>
>> > int kthread_stop(struct task_struct *k)
>> > {
>> > struct kthread *kthread;
>> > int ret;
>> >
>> > trace_sched_kthread_stop(k);
>> > get_task_struct(k);
>> >
>> > kthread = to_kthread(k);
>> > barrier(); /* it might have exited */
>> > if (k->vfork_done != NULL) {
>> > kthread->should_stop = 1;
>> > wake_up_process(k);
>> > wait_for_completion(&kthread->exited);
>> > }
>> > ret = k->exit_code;
>>
>> I don't think this works. How does do_exit() preserve a stack var, other
>> than for a few cycles longer? Sure, the vfork_done will be OK, but this code
>> here will not be. I think you'd need a get_task_struct(current) before the
>> do_exit(ret)
>
> I think this works ;)
>
> This stack frame can't disappear until
> __put_task_struct()->...->free_thread_info().
> So, if you have a reference to task_struct, then it it is safe to dereference
> to_kthread(task).

Correct.

> Before this patch, kthread_stop() can only be used when we know that kthread
> must not exit by its own. And with this patch we are safe in this case, note
> that kthread_stop() does get_task_struct() before it sets ->should_stop = 1.
> And this also pins the memory pointed by to_kthread().

>> (the case where the kthread fn calls do_exit() is fine: you're
>> not allowed to call kthread stop on such threads).
>
> This was not allowed, but now this is fine. Please look at the 4/4 patch.
> But, in that case you must pin the task_struct after kthread_create(),
> otherwise (with or without this patch) you just can't use this task_struct
> in any way.

To finish the conversion of everything to kthreads from kernel_thread
we need to be able to call kthread_stop on threads that call do_exit.
That kthread_stop cannot be called on such threads is currently a major
deficiency of the kthread api.

>> In which case using vfork_done is really just a convenience pointer inside
>> struct task_struct to stash the struct kthread. And that's horribly ugly,
>> which is why I stuck with a simple global. Changing to a linked-list of
> things
>> to stop would avoid the deadlock you mentioned where a kthread stops another
>> kthread.
>
> Well, this patch overloads ->vfork_done, and I agree this is a bit ugly.
> But what you suggest (if I undestand correctly) is more complex, and doesn't
> have any advantages, imho.

No. This patch does not overload or really abuse vfork_done.
vfork_done is named a bit misleadingly. It should arguably be called
mm_done. vfork_done (if set) always points to a completion that will
be completed when do_exit() -> mm_release() is called.

What we want is some completion called from inside of do_exit.
mm_release happens to be such a completion and the code already
exists.

If mm_release did not do: tsk->vfork_done = NULL then the entire
test of if (k->vfork_done != NULL) would be unnecessary.

Oleg on that note we should not need a barrier at all. We should be
able to simply say:

cmplp = k->vfork_done;
if (cmplp){
/* if vfork_done is NULL we have passed mm_release */
kthread = container_of(cmplp, struct kthread, exited);
kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);
}

Thinking of it I wish we had someplace we could store a pointer
that would not be cleared so we could remove that whole confusing
conditional. I just looked through task_struct and there doesn't
appear to be anything promising.

Perhaps we could rename vfork_done mm_done and not clear it in
mm_release.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/