Re: [PATCH] sched_ext: Fix cpu_released while RT task and SCX task are scheduled concurrently

From: 'Tejun Heo'
Date: Mon Jun 23 2025 - 15:51:07 EST


Hello,

On Sat, Jun 21, 2025 at 04:09:55AM +0000, liuwenfang wrote:
> Supposed RT task(rt1) is running on one CPU with its rq->scx.cpu_released
> set to true, if the rt1 becomes sleeping, then the scheduler will balance
> the remote SCX task(scx1) because there is no other RT task on its rq,
> and rq->scx.cpu_released is false. While one RT task(rt2) is placed on
> this rq(maybe rt2 wakeup or migration occurs) before the scx1 is enqueued,
> then the scheduler will pick rt2. At last, rt2 will be running on this cpu
> with rq->scx.cpu_released being false!
> The main reason is that consume_remote_task() will unlock rq lock.

This is rather difficult to follow. Can you please break this down to a
table? People often use a format like the following:

CPU X CPU Y
A does something
B does something else
...
...
Boom

> @@ -2470,6 +2471,11 @@ static inline void put_prev_set_next_task(struct rq *rq,
>
> prev->sched_class->put_prev_task(rq, prev, next);
> next->sched_class->set_next_task(rq, next, true);
> +
> +#ifdef CONFIG_SCHED_CLASS_EXT
> + if (scx_enabled())
> + switch_class(rq, next);
> +#endif

You're right that there is a race condition around this and I can't see a
way to solve this in SCX proper as there's no way for balance() to tell
whether a higher priority sched class has queued something while balance()
dropped the rq lock for migration, so adding a hook to
put_prev_set_next_task() seems like a reasoanble solution. However, can you
please do the followings?

- Improve the description so that the race condition is clearly
understandable and explain why the extra hook in put_prev_set_next_task()
is necessary.

- Rename switch_class() to something which fits the new location better -
maybe scx_put_prev_set_next_task().

- If the function is called from put_prev_set_next_task(), it doesn't need
to be called from put_prev_task_scx(). Drop that call.

Thanks.

--
tejun