Re: [PATCH v7 2/2] sched/rt: Trying to push current task when target disable migrating

From: Schspa Shi
Date: Sun Aug 28 2022 - 12:28:11 EST



Dietmar Eggemann <dietmar.eggemann@xxxxxxx> writes:

> On 13/07/2022 15:48, Schspa Shi wrote:
>> When the task to push disable migration, retry to push the current
>> running task on this CPU away, instead doing nothing for this migrate
>> disabled task.
>>
>> Signed-off-by: Schspa Shi <schspa@xxxxxxxxx>
>
> Unfortunately, I can't recreate this issue on my Arm64 6 CPUs system on
> mainline or PREEMPT_RT (linux-5.19.y-rt and v5.10.59-rt52) (the one you
> mentioned in v6.)
>
> With an rt-app rt workload of 12-18 periodic rt-tasks (4/16ms) all with
> different priorities I never ran into a `is_migration_disabled(task)`
> situation. I only ever get `task_rq(task) != rq` or `task_running(rq,
> task)` under the `if (double_lock_balance(rq, lowest_rq))` condition in
> find_lock_lowest_rq().
>

I think we need to write a kernel module to add more hard irq context to
increase the probability of recurrence.

I never recreate this issue with my test case too. But our test team can
reproduce the problem, they have more machines to reproduce the problem,
and the problem is easier to reproduce when the CPU is hotplugging.


> [...]
>
>> // XXX validate p is still the highest prio task
>> if (task_rq(p) == rq) {
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index cb3b886a081c..21af20445e7f 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -2335,6 +2335,15 @@ static int push_dl_task(struct rq *rq)
>> */
>> task = pick_next_pushable_dl_task(rq);
>> if (task == next_task) {
>> + /*
>> + * If next task has now disabled migrating, see if we
>> + * can push the current task.
>> + */
>> + if (unlikely(is_migration_disabled(task))) {
>> + put_task_struct(next_task);
>> + goto retry;
>> + }
>> +
>
> Looks like for DL this makes no sense since we're not pushing rq->curr
> in `retry:` like for RT in case `is_migration_disabled(next_task)`.
>

It seems we have the opportunity to execute resched_curr, which will
have a similar effect. I should change the comments for this for next
patch version.

> [...]
>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>

--
BRs
Schspa Shi