Re: [tip:sched/core] sched: Fix ancient race in do_exit()

From: Oleg Nesterov
Date: Sun Jan 29 2012 - 11:13:53 EST


On 01/28, Linus Torvalds wrote:
>
> On Sat, Jan 28, 2012 at 4:03 AM, tip-bot for Yasunori Goto
> <y-goto@xxxxxxxxxxxxxx> wrote:
> >
> > sched: Fix ancient race in do_exit()
>
> Ugh.
>
> It would be much nicer to just clear the rwsem waiter->task thing
> *after* waking the task up, which would avoid this race entirely,
> afaik.

How? The problem is that wake_up_process(tsk) sees this task in
TASK_UNINTERRUPTIBLE state (the first "p->state & state" check in
try_to_wake_up), but then this task changes its state to TASK_DEAD
without schedule() and ttwu() does s/TASK_DEAD/TASK_RUNNING/.

IOW, the task doing

current->state = TASK_A;
...
current->state = TASK_B;
schedule();

can be woken up by try_to_wake_up(TASK_A), despite the fact it
sleeps in TASK_B. do_exit() is only "special" because it is not
easy to handle the spurious wakeup.

> Tell me, why wouldn't that work? rwsem_down_failed_common() does
>
> /* wait to be given the lock */
> for (;;) {
> if (!waiter.task)
> break;
> ...
>
> so then we wouldn't need the task refcount crap in rwsem either etc,
> and we'd get rid of all races with wakeup.
>
> I wonder why we're clearing that whole waiter->task so early.

I must have missed something. I can't understand how this can help,
and "clear the rwsem waiter->task thing *after* waking" looks
obviously wrong. If we do this, then we can miss the "!!waiter.task"
condition. The loop above actually does

set_task_state(TASK_UNINTERRUPTIBLE);

if (!waiter.task)
break;
schedule();

and
wake_up_process(tsk);
waiter->task = NULL;

can happen right after set_task_state().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/