RE: Scheduling tasks on idle cpu

From: David Laight
Date: Thu Apr 14 2022 - 04:35:45 EST


From: Vincent Guittot
> Sent: 14 April 2022 08:54
>
> On Thu, 14 Apr 2022 at 01:57, Qais Yousef <qais.yousef@xxxxxxx> wrote:
> >
> > On 04/12/22 11:07, Vincent Guittot wrote:
> > > On Tue, 12 Apr 2022 at 10:39, David Laight <David.Laight@xxxxxxxxxx> wrote:
> > > > Yes I want the CFS scheduler to pick an idle cpu in preference
> > > > to an active RT one.
> > >
> > > When task 34512 wakes up, scheduler checks if prev or this cpu are
> > > idle which is not the case for you. Then, it compares the load of prev
> > > and this_cpu and seems to select this_cpu (cpu17).
> > >
> > > Once cpu17 selected, it will try to find an idle cpu which shares LLC
> > > but it seems that the scheduler didn't find one and finally keeps task
> > > 34512 on this_cpu.
> > >
> > > Note that during the next tick, a load balance will be trigger if
> > > this_cpu still have both RT and task 34512,
> >
> > David said there are idle cpus
> >
> > " There are two physical cpu with 20 cores each (with hyperthreading).
> > 16, 18, 34, 36 and 38 were idle. So both 16 and 18 should be on the
> > same NUMA node. All the others are running the same RT thread code. "
> >
> > Except for the possibility of them becoming idle just after the task has woken
> > up, shouldn't one of them have been picked?
>
> we don't loop on all cpus in the LLC to find an idle one but compute a
> reasonable number of iteration based on the avg_idle

Is there a way to dump the kernel NUMA/LLC tables?
This might be relevant (with everything idle):
# cat /proc/schedstat
version 15
timestamp 5388989193
cpu0 0 0 0 0 0 0 117226041384582 250531565354 206276873
domain0 00,00100001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 55,55555555 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 115978661288718 251736933814 297093280
domain0 00,00200002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 aa,aaaaaaaa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
All the later cpu follow the same pattern (domain0 shifts left every cpu).

I could interpret that as meaning:
cpu n and (n + 20) are the hyperthreading pairs.
Even numbered cpu are on one chip, odd numbered on the other.

The migrate was:
34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
All the idle cpu were even.

> David can rerun is use case after disabling sched_feat(SIS_PROP)

How would I do that?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)