Re: [RFC] Thread Migration Preemption - v4

From: Oleg Nesterov
Date: Sat Jul 14 2007 - 16:23:45 EST


On 07/14, Mathieu Desnoyers wrote:
>
> @@ -4891,10 +4948,42 @@ static int migration_thread(void *data)
> list_del_init(head->next);
>
> spin_unlock(&rq->lock);
> - __migrate_task(req->task, cpu, req->dest_cpu);
> + migrated = __migrate_task(req->task, cpu, req->dest_cpu);
> local_irq_enable();
> -
> - complete(&req->done);
> + if (!migrated) {
> + /*
> + * If the process has not been migrated, let it run
> + * until it reaches a migration_check() so it can
> + * wake us up.
> + */
> + spin_lock_irq(&rq->lock);
> + head = &rq->migration_queue;
> + list_add(&req->list, head);
> + if (req->task->se.on_rq
> + || !task_migrate_count(req->task)) {
> + /*
> + * The process is on the runqueue, it could
> + * exit its critical section at any moment,
> + * don't race with it and retry actively.
> + * Also, if the thread is not on the runqueue
> + * and has a zero migration count
> + * (__migrate_task failed because cpus allowed
> + * changed), just retry.
> + */
> + spin_unlock_irq(&rq->lock);
> + continue;

Again, this can deadlock. migration_thread() is SCHED_FIFO, and it shares the
same CPU with req->task. We are doing a busy-wait loop, req->task may have no
chance to finish its critical section.

> + }
> + set_tsk_thread_flag(req->task, TIF_NEED_MIGRATE);
> + set_current_state(TASK_INTERRUPTIBLE);
> + spin_unlock_irq(&rq->lock);
> + /*
> + * Wait for the process currently in its critical
> + * section.
> + */
> + wake_up_process(req->task);
> + schedule();

As Peter pointed out, we hold up all other migration requests.

And worse, this can hang forever (until another req comes). do_check_migrate()
can wake_up() the wrong rq.

set_current_state(TASK_INTERRUPTIBLE) is bogus, we can miss a wakeup from
(say) sched_migrate_task().

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/