Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

From: Pavan Kondeti
Date: Mon Feb 17 2020 - 23:16:32 EST


On Mon, Feb 17, 2020 at 01:53:07PM +0000, Qais Yousef wrote:
> On 02/17/20 14:53, Pavan Kondeti wrote:
> > Hi Qais,
> >
> > On Fri, Feb 14, 2020 at 04:39:49PM +0000, Qais Yousef wrote:
> >
> > [...]
> >
> > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > > index 0c8bac134d3a..5ea235f2cfe8 100644
> > > --- a/kernel/sched/rt.c
> > > +++ b/kernel/sched/rt.c
> > > @@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > > {
> > > struct task_struct *curr;
> > > struct rq *rq;
> > > - bool test;
> > > + bool test, fit;
> > >
> > > /* For anything but wake ups, just return the task_cpu */
> > > if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> > > @@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > > unlikely(rt_task(curr)) &&
> > > (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
> > >
> > > - if (test || !rt_task_fits_capacity(p, cpu)) {
> > > + fit = rt_task_fits_capacity(p, cpu);
> > > +
> > > + if (test || !fit) {
> > > int target = find_lowest_rq(p);
> > >
> > > - /*
> > > - * Don't bother moving it if the destination CPU is
> > > - * not running a lower priority task.
> > > - */
> > > - if (target != -1 &&
> > > - p->prio < cpu_rq(target)->rt.highest_prio.curr)
> > > - cpu = target;
> > > + if (target != -1) {
> > > + /*
> > > + * Don't bother moving it if the destination CPU is
> > > + * not running a lower priority task.
> > > + */
> > > + if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
> > > +
> > > + cpu = target;
> > > +
> > > + } else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
> > > +
> > > + /*
> > > + * If the priority is the same and the new CPU
> > > + * is a better fit, then move, otherwise don't
> > > + * bother here either.
> > > + */
> > > + fit = rt_task_fits_capacity(p, target);
> > > + if (fit)
> > > + cpu = target;
> > > + }
> > > + }
> >
> > I understand that we are opting for the migration when priorities are tied but
> > the task can fit on the new task. But there is no guarantee that this task
> > stay there. Because any CPU that drops RT prio can pull the task. Then why
> > not leave it to the balancer?
>
> This patch does help in the 2 RT task test case. Without it I can see a big
> delay for the task to migrate from a little CPU to a big one, although the big
> is free.
>
> Maybe my test is too short (1 second). The delay I've seen is 0.5-0.7s..
>
> https://imgur.com/a/qKJk4w4
>
> Maybe I missed the real root cause. Let me dig more.
>
> >
> > I notice a case where tasks would migrate for no reason (happens without this
> > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > I don't know if it makes any difference but I see RT task placement is too
> > keen on reducing the migrations unless it is absolutely needed.
>
> In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> and prefer it if it is.
>
> But yeah I see it happening too
>
> https://imgur.com/a/FYqLIko
>
> Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> show that.
>
> I shouldn't have changed something to affect this bias. Do you think it's
> something I introduced?
>
> It's something maybe worth digging into though. I'll try to have a look.
>

The original RT task placement i.e without capacity awareness, places the task
on the previous CPU if the task can preempt the running task. I interpreted it
as that "higher prio RT" task should get better treatment even if it results
in stopping the lower prio RT execution and migrating it to another CPU.

Now coming to your patch (merged), we force find_lowest_rq() if the previous
CPU can't fit the task though this task can right away run there. When the
lowest mask returns an unfit CPU (with your new patch), We have two choices,
either to place it on this unfit CPU (may involve migration) or place it on
the previous CPU to avoid the migration. We are selecting the first approach.

The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
does not have a RT task. If it is running a lower prio RT task than the
waking task, the lowest_mask may not contain the previous CPU.

I don't if any workload hurts due to this change in behavior. So not sure
if we have to restore the original behavior. Something like below will do.

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4043abe..c80d948 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1475,11 +1475,15 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
int target = find_lowest_rq(p);

/*
- * Don't bother moving it if the destination CPU is
- * not running a lower priority task.
+ * Don't bother moving it
+ *
+ * - If the destination CPU is not running a lower priority task
+ * - The task can't fit on the destination CPU and it can run
+ * right away on it's previous CPU.
*/
- if (target != -1 &&
- p->prio < cpu_rq(target)->rt.highest_prio.curr)
+ if (target != -1 && target != cpu &&
+ p->prio < cpu_rq(target)->rt.highest_prio.curr &&
+ (test || rt_task_fits_capacity(p, target)))
cpu = target;
}
rcu_read_unlock();

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.