Re: RFC [PATCH] SMP: Don't schedule tasks on inactive cpu(s)

From: Peter Zijlstra
Date: Wed Feb 29 2012 - 06:48:45 EST


On Wed, 2012-02-29 at 12:03 +0100, Peter Zijlstra wrote:
> On Wed, 2012-02-29 at 11:42 +0100, Jonas Aaberg wrote:
> > This patch removes the ability to schedule tasks on cpus that are online,
> > but not active. The reason for this patch is that during cpu hotplug
> > on ARM (atleast) there is a short window where cpuX (X > 0) is online, but
> > busy-waiting on cpu0 to put it active, meanwhile cpu0 can be interrupted
> > and try to schedule something on the cpu that is busy checking its active bit.
>
> https://lkml.org/lkml/2011/12/15/255
>
> that one?
>
> I _think_ its correct, but it would be so good if someone else could
> verify.

Relevant patches to consider are: e761b772 and 3a101d05.

Having looked at this again, I think we lost something in 3a101d05 since
it moves cpuset_update_active_cpus() from CPU_DEAD to CPU_DOWN_PREPARE
(and DOWN_FAILED) -- not that it matters that much. Also this patch does
leaves me somewhat puzzled as to what cpu_active_mask is for now..

The suggested patch linked above moves setting active to CPU_STARTING
which is _before_ online. It looks like some parts of the scheduler
don't look at online at all anymore so that opens a 'window' where we
could select a cpu that isn't part of the sched_domain nor online
(select_fallback_rq and cpuset_cpus_allowed_fallback).

Now this isn't really a problem because of stop-machine, by the time
anybody gets to run again both online and active are set and we should
be good to go. The bad part is of course us relying on this silly
stop-machine semantic.

Bah, hotplug is such a pain..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/