Re: [PATCH] sched: Revert 14a40ffccd61 ("sched: replacePF_THREAD_BOUND with PF_NO_SETAFFINITY")

From: Ingo Molnar
Date: Mon Dec 02 2013 - 10:37:55 EST



* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> Subject: sched: Revert 14a40ffccd61 ("sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY")
>
> PF_NO_SETAFFINITY, which crudely turns off affinity control and cgroups
> access to tasks, and which is in use by every workqueue thread in Linux (!),
> is conceptually wrong on many levels:
>
> - We should strive to never consciously place artificial limitations on
> kernel functionality; our main reason to place limitations should be
> correctness.
>
> There are cases where limiting affinity is justified: for example the
> case of single cpu workqueue threads, which are special for their
> limited concurrency, esp. when coupled with per-cpu resources --
> allowing such threads to run on other cpus is a correctness violation
> and can crash the kernel.
>
> - But using it outside this case is overly broad; it dis-allows usage
> that is functionally fine and in some cases desired.
>
> In particular; tj argues ( http://lkml.kernel.org/r/20131128143848.GD3925@xxxxxxxxxxxxxx )
>
> "That's just inviting people to do weirdest things and then
> reporting things like "crypt jobs on some of our 500 machines end up
> stuck on a single cpu once in a while" which will eventually be
> tracked down to some weird shell script setting affinity on workers
> doing something else."
>
> While that is exactly what HPC/RT people _want_ in order to avoid
> disturbing the other CPUs with said crypt work.
>
> - Furthermore, the above example is also wrong in that you should not
> protect root from itself; there's plenty root can do to shoot his
> own foot off, let alone shoot his head off.
>
> Setting affinities is limited to root, and if root messes up the
> system he can keep the pieces. But limiting in an overly broad
> fashion stifles the functionality of the system.
>
> - Lastly; the flag actually blocks entry into cgroups as well as
> sched_setaffinity(), so the name is misleading at best.
>
> The right fix is to only set PF_THREAD_BOUND on per-cpu workers:
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
>
> set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
>
> - /* prevent userland from meddling with cpumask of workqueue workers */
> - worker->task->flags |= PF_NO_SETAFFINITY;
> -
> /*
> * The caller is responsible for ensuring %POOL_DISASSOCIATED
> * remains stable across this function. See the comments above the
> * flag definition for details.
> */
> - if (pool->flags & POOL_DISASSOCIATED)
> + if (pool->flags & POOL_DISASSOCIATED) {
> worker->flags |= WORKER_UNBOUND;
> + } else {
> + /*
> + * Prevent userland from meddling with cpumask of workqueue
> + * workers:
> + */
> + worker->task->flags |= PF_THREAD_BOUND;
> + }
>
> Therefore revert the patch and add an annoying but non-destructive warning
> check against abuse.

Hm, I missed these problems with the approach, but I think you are
right.

Tejun, I suspect you concur with Peter's analysis, can I also add
Peter's workqueue.c fixlet above to workqueue.c to this patch plus
your Acked-by, so that the two changes are together?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/