Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched_setaffinity()

From: Will Deacon
Date: Tue Jan 17 2023 - 11:08:48 EST


Hi Waiman,

On Thu, Sep 22, 2022 at 02:00:38PM -0400, Waiman Long wrote:
> The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
> Introduce task_struct::user_cpus_ptr to track requested affinity"). It
> is currently used only by arm64 arch due to possible asymmetric CPU
> setup. This patch extends its usage to save user provided cpumask
> when sched_setaffinity() is called for all arches. With this patch
> applied, user_cpus_ptr, once allocated after a successful call to
> sched_setaffinity(), will only be freed when the task exits.
>
> Since user_cpus_ptr is supposed to be used for "requested
> affinity", there is actually no point to save current cpu affinity in
> restrict_cpus_allowed_ptr() if sched_setaffinity() has never been called.
> Modify the logic to set user_cpus_ptr only in sched_setaffinity() and use
> it in restrict_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
> if defined but not changing it.
>
> This will be some changes in behavior for arm64 systems with asymmetric
> CPUs in some corner cases. For instance, if sched_setaffinity()
> has never been called and there is a cpuset change before
> relax_compatible_cpus_allowed_ptr() is called, its subsequent call will
> follow what the cpuset allows but not what the previous cpu affinity
> setting allows.
>
> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> ---
> kernel/sched/core.c | 82 ++++++++++++++++++++------------------------
> kernel/sched/sched.h | 7 ++++
> 2 files changed, 44 insertions(+), 45 deletions(-)

We've tracked this down as the cause of an arm64 regression in Android and I've
reproduced the issue with mainline.

Basically, if an arm64 system is booted with "allow_mismatched_32bit_el0" on
the command-line, then the arch code will (amongst other things) call
force_compatible_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
when exec()'ing a 32-bit or a 64-bit task respectively.

If you consider a system where everything is 64-bit but the cmdline option
above is present, then the call to relax_compatible_cpus_allowed_ptr() isn't
expected to do anything in this case, and the old code made sure of that:

> @@ -3055,30 +3032,21 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
>
> /*
> * Restore the affinity of a task @p which was previously restricted by a
> - * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
> - * @p->user_cpus_ptr.
> + * call to force_compatible_cpus_allowed_ptr().
> *
> * It is the caller's responsibility to serialise this with any calls to
> * force_compatible_cpus_allowed_ptr(@p).
> */
> void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
> {
> - struct cpumask *user_mask = p->user_cpus_ptr;
> - unsigned long flags;
> + int ret;
>
> /*
> - * Try to restore the old affinity mask. If this fails, then
> - * we free the mask explicitly to avoid it being inherited across
> - * a subsequent fork().
> + * Try to restore the old affinity mask with __sched_setaffinity().
> + * Cpuset masking will be done there too.
> */
> - if (!user_mask || !__sched_setaffinity(p, user_mask))
> - return;

... since it returned early here if '!user_mask' ...

> -
> - raw_spin_lock_irqsave(&p->pi_lock, flags);
> - user_mask = clear_user_cpus_ptr(p);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> -
> - kfree(user_mask);
> + ret = __sched_setaffinity(p, task_user_cpus(p));
> + WARN_ON_ONCE(ret);

... however, now we end up going down into __sched_setaffinity() with
task_user_cpus(p) giving us the 'cpu_possible_mask'! This can lead to a mixture
of WARN_ON()s and incorrect affinity masks (for example, a newly exec'd task
ends up with the affinity mask of the online CPUs at the point of exec() and is
unable to run on anything onlined later).

I've had a crack at fixing the code above to restore the old behaviour, and it
seems to work for my basic tests (still pending confirmation from others):


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bb1ee6d7bdde..0d4a11384648 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3125,17 +3125,16 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx);
void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
{
struct affinity_context ac = {
- .new_mask = task_user_cpus(p),
+ .new_mask = p->user_cpus_ptr,
.flags = 0,
};
- int ret;

/*
* Try to restore the old affinity mask with __sched_setaffinity().
* Cpuset masking will be done there too.
*/
- ret = __sched_setaffinity(p, &ac);
- WARN_ON_ONCE(ret);
+ if (ac.new_mask)
+ WARN_ON_ONCE(__sched_setaffinity(p, &ac));
}

void set_task_cpu(struct task_struct *p, unsigned int new_cpu)


With this change, task_user_cpus() is only used by restrict_cpus_allowed_ptr()
so I'd be inclined to remove it altogether tbh.

What do you think?

Will