Re: [PATCH 4/7] sched: implement force_cpus_allowed()

From: Peter Zijlstra
Date: Mon Dec 07 2009 - 03:36:09 EST


On Mon, 2009-12-07 at 13:34 +0900, Tejun Heo wrote:
> Hello, Peter.
>
> On 12/04/2009 07:43 PM, Peter Zijlstra wrote:
> >>> force_cpus_allowed() will be used for concurrency-managed workqueue.
> >>
> >> Would still like to know why all this is needed.
> >
> > That is, what problem do these new-fangled workqueues have and why is
> > this a good solution.
>
> This is the original RFC posting of cmwq which includes the whole
> thing. I'm a few days away from posting a new version but the usage
> of force_cpus_allowed() remains the same.
>
> http://thread.gmane.org/gmane.linux.kernel/896268/focus=896294
>
> There are two tests which are bypassed by the force_ variant.
>
> * PF_THREAD_BOUND. This is used to mark tasks which are bound to a
> cpu using kthread_bind() to be bound permanently. However, new
> trustee based workqueue hotplugging decouples per-cpu workqueue
> flushing with cpu hot plug/unplugging. This is necessary because
> with cmwq, long running works can be served by regular workqueues,
> so delaying completion of hot plug/unplugging till certain works are
> flushed isn't feasible. So, what becomes necessary is the ability
> to re-bind tasks which has PF_THREAD_BOUND set but unbound from its
> now offline cpu which is coming online again.

I'm not at all sure I like that. I'd be perfectly happy with delaying
the hot-unplug.

The whole cpu hotplug mess is tricky enough as it is and I see no
compelling reason to further complicate it. If people are really going
to enqueue strict per-cpu worklets (queue_work_on()) that takes seconds
to complete, then they get to keep the results of that, which includes
slow hot unplug.

Having an off-line cpu still process code like it was online is asking
for trouble, don't go there.

> * cpu_active() test. CPU activeness encloses cpu online status which
> is the actual on/offline state. Workqueues need to keep running
> while a cpu is going down and with cmwq keeping workqueues running
> involves creation of new workers (it consists part of
> forward-progress guarantee and one of cpu down callbacks might end
> up waiting on completion of certain works).
>
> The problem with active state is that during cpu down, active going
> off doesn't mean all tasks have been migrated off the cpu, so
> without a migration interface which is synchronized with the actual
> offline migration, it is difficult to guarantee that all works are
> either running on the designated cpu if the cpu is online or all
> work on other cpus if the cpu is offline.
>
> Another related problem is that there's no way to monitor the cpu
> activeness change notifications.

cpu_active() is basically meant for the scheduler to not stick new tasks
on a dying cpu.

So on hot-unplug you'd want to splice your worklets to another cpu,
except maybe those strictly enqueued to the dying cpu, and since there
was work on the dying cpu, you already had a task processing them, so
you don't need new tasks, right?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/