Re: [PATCH-tip v4] sched: Fix NULL user_cpus_ptr check in dup_user_cpus_ptr()

From: Will Deacon
Date: Tue Nov 29 2022 - 09:08:37 EST


On Mon, Nov 28, 2022 at 10:11:52AM -0500, Waiman Long wrote:
> On 11/28/22 07:00, Will Deacon wrote:
> > On Sun, Nov 27, 2022 at 08:43:27PM -0500, Waiman Long wrote:
> > > On 11/24/22 21:39, Waiman Long wrote:
> > > > In general, a non-null user_cpus_ptr will remain set until the task dies.
> > > > A possible exception to this is the fact that do_set_cpus_allowed()
> > > > will clear a non-null user_cpus_ptr. To allow this possible racing
> > > > condition, we need to check for NULL user_cpus_ptr under the pi_lock
> > > > before duping the user mask.
> > > >
> > > > Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
> > > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> > > This is actually a pre-existing use-after-free bug since commit 07ec77a1d4e8
> > > ("sched: Allow task CPU affinity to be restricted on asymmetric systems").
> > > So it needs to be fixed in the stable release as well. Will resend the patch
> > > with an additional fixes tag and updated commit log.
> > Please can you elaborate on the use-after-free here? Looking at
> > 07ec77a1d4e8, the mask is only freed in free_task() when the usage refcount
> > has dropped to zero and I can't see how that can race with fork().
> >
> > What am I missing?
>
> I missed that at first. The current task cloning process copies the content
> of the task structure over to the newly cloned/forked task. IOW, if
> user_cpus_ptr had been set up previously, it will be copied over to the
> cloned task. Now if user_cpus_ptr of the source task is cleared right after
> that and before dup_user_cpus_ptr() is called. The obsolete user_cpus_ptr
> value in the cloned task will remain and get used even if it has been freed.
> That is what I call as use-after-free and double-free.

If the parent task can be modified concurrently with dup_task_struct() then
surely we'd have bigger issues because that's not going to be atomic? At the
very least we'd have a data race, but it also feels like we could end up
with inconsistent task state in the child. In fact, couldn't the normal
'cpus_mask' be corrupted by a concurrent set_cpus_allowed_common()?

Or am I still failing to understand the race?

Will