Re: [PATCH] sched/fair: Fix fault in reweight_entity

From: Tadeusz Struk
Date: Wed Jan 19 2022 - 21:22:22 EST


On 1/19/22 07:43, Tadeusz Struk wrote:
Looks like after this change there is a time window, when
task_struct->se.cfs_rq can be NULL. This can be exploited to trigger
null-ptr-deref by calling setpriority on that task.
Looks like isn't good enough, either there is, in which case you explain
the window, or there isn't in which case what are we doing here?

There surely is something wrong, otherwise it wouldn't crash.
I will try to narrow down the reproducer to better understand what causes
the fault.

The race is between sched_post_fork() and setpriority(PRIO_PGRP)
The scenario is that the main process spawns 3 new threads,
which then call setpriority(PRIO_PGRP, 0, -20), wait, and exit.
For each of the new thread the copy_process() gets invoked,
which then calls sched_fork() and finally sched_post_fork().

There is a possibility that setpriority(PRIO_PGRP)->set_one_prio() will be
called for a thread in the group that is just being created by copy_process(),
and for which the sched_post_fork() has not been executed yet.
This will trigger a null pointer dereference in reweight_entity()
because it will try to access the CFS run queue pointer, which hasn't been set, resulting it a crash as below:

KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
CPU: 0 PID: 2392 Comm: reduced_repro Not tainted 5.16.0-11201-gb42c5a161ea3-dirty #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
RIP: 0010:reweight_entity+0x15d/0x440
RSP: 0018:ffffc900035dfcf8 EFLAGS: 00010006
Call Trace:
<TASK>
reweight_task+0xde/0x1c0
set_load_weight+0x21c/0x2b0
set_user_nice.part.0+0x2d1/0x519
set_user_nice.cold+0x8/0xd
set_one_prio+0x24f/0x263
__do_sys_setpriority+0x2d3/0x640
__x64_sys_setpriority+0x84/0x8b
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
---[ end trace 9dc80a9d378ed00a ]---

Before the mentioned change the rq pointer has been set in sched_fork(),
which is called much earlier in copy_process() as opposed to sched_post_fork(),
before the new task is added to the thread_group.

A stripped down version of the sysbot reproducer can be found here:
https://termbin.com/axkq

I can consistently reproduce the issue with it in 2-3 runs.

The solution is either we set the pointer p->se.cfs_rq to a dummy rq in sched_fork(), or return from the set_one_prio() without doing anything
if the rq is NULL, as it is done in the patch.
I will update the description and resend it tomorrow.

--
Thanks,
Tadeusz