Re: [RFC PATCH v3 00/16] Core scheduling v3

From: Vineeth Remanan Pillai
Date: Mon Sep 30 2019 - 07:53:45 EST


On Thu, Sep 12, 2019 at 8:35 AM Aaron Lu <aaron.lu@xxxxxxxxxxxxxxxxx> wrote:
> >
> > I think comparing parent's runtime also will have issues once
> > the task group has a lot more threads with different running
> > patterns. One example is a task group with lot of active threads
> > and a thread with fairly less activity. So when this less active
> > thread is competing with a thread in another group, there is a
> > chance that it loses continuously for a while until the other
> > group catches up on its vruntime.
>
> I actually think this is expected behaviour.
>
> Without core scheduling, when deciding which task to run, we will first
> decide which "se" to run from the CPU's root level cfs runqueue and then
> go downwards. Let's call the chosen se on the root level cfs runqueue
> the winner se. Then with core scheduling, we will also need compare the
> two winner "se"s of each hyperthread and choose the core wide winner "se".
>
Sorry, I misunderstood the fix and I did not initially see the core wide
min_vruntime that you tried to maintain in the rq->core. This approach
seems reasonable. I think we can fix the potential starvation that you
mentioned in the comment by adjusting for the difference in all the children
cfs_rq when we set the minvruntime in rq->core. Since we take the lock for
both the queues, it should be doable and I am trying to see how we can best
do that.

> >
> > As discussed during LPC, probably start thinking along the lines
> > of global vruntime or core wide vruntime to fix the vruntime
> > comparison issue?
>
> core wide vruntime makes sense when there are multiple tasks of
> different cgroups queued on the same core. e.g. when there are two
> tasks of cgroupA and one task of cgroupB are queued on the same core,
> assume cgroupA's one task is on one hyperthread and its other task is on
> the other hyperthread with cgroupB's task. With my current
> implementation or Tim's, cgroupA will get more time than cgroupB. If we
> maintain core wide vruntime for cgroupA and cgroupB, we should be able
> to maintain fairness between cgroups on this core. Tim propose to solve
> this problem by doing some kind of load balancing if I'm not mistaken, I
> haven't taken a look at this yet.
I think your fix is almost close to maintaining a core wide vruntime as you
have a single minvruntime to compare now across the siblings in the core.
To make the fix complete, we might need to adjust the whole tree's
min_vruntime and I think its doable.

Thanks,
Vineeth