Re: [Bug #11989] Suspend failure on NForce4-based boards due tochanes in stop_machine

From: Oleg Nesterov
Date: Tue Nov 11 2008 - 10:33:38 EST


On 11/11, Vegard Nossum wrote:
>
> I think that the test for stop_machine_data in stop_cpu() should not
> have been moved from __stop_machine(). Because now cpu_online_map may
> change in-between calls to stop_cpu() (if the callback tries to
> online/offline CPUs), and the end result may be different.

I don't think this is possible, the callback must not be called unless
all threads ack (at least) the STOPMACHINE_PREPARE state.


Off-topic question, __stop_machine() does:

/* Schedule the stop_cpu work on all cpus: hold this CPU so one
* doesn't hit this CPU until we're ready. */
get_cpu();
for_each_online_cpu(i) {
sm_work = percpu_ptr(stop_machine_work, i);
INIT_WORK(sm_work, stop_cpu);
queue_work_on(i, stop_machine_wq, sm_work);
}
/* This will release the thread on our CPU. */
put_cpu();

Don't we actually need preempt_disable/preempt_enable instead of
get/put cpu? (yes, there the same currently). We don't care about
the CPU we are running on, and it can't go away until we queue all
works. But we must ensure that stop_cpu() on the same CPU can't
preempt us, right?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/