Re: HT (Hyper Threading) aware process scheduling doesn't work asit should

From: Mike Galbraith
Date: Thu Nov 03 2011 - 09:01:05 EST


On Thu, 2011-11-03 at 09:18 +0100, Ingo Molnar wrote:
>
> ( Sorry about the delay in the reply - folks are returning from and
> recovering from the Kernel Summit ;-) I've extended the Cc: list.
> Please Cc: scheduler folks when reporting bugs, next time around. )
>
> * Artem S. Tashkinov <t.artem@xxxxxxxxx> wrote:
>
> > Hello,
> >
> > It's known that if you want to reach maximum performance on HT
> > enabled Intel CPUs you should distribute the load evenly between
> > physical cores, and when you have loaded all of them you should
> > then load the remaining virtual cores.
> >
> > For example, if you have 4 physical cores and 8 virtual CPUs then
> > if you have just four tasks consuming 100% of CPU time you should
> > load four CPU pairs:
> >
> > VCPUs: {1,2} - one task running
> >
> > VCPUs: {3,4} - one task running
> >
> > VCPUs: {5,6} - one task running
> >
> > VCPUs: {7,8} - one task running
> >
> > It's absolutely detrimental to performance to bind two tasks to
> > e.g. two physical cores {1,2} {3,4} and then the remaining two
> > tasks to e.g. the third core 5,6:
> >
> > VCPUs: {1,2} - one task running
> >
> > VCPUs: {3,4} - one task running
> >
> > VCPUs: {5,6} - *two* task runnings
> >
> > VCPUs: {7,8} - no tasks running
> >
> > I've found out that even on Linux 3.0.8 the process scheduler
> > doesn't correctly distributes the load amongst virtual CPUs. E.g.
> > on a 4-core system (8 total virtual CPUs) the process scheduler
> > often run some instances of four different tasks on the same
> > physical CPU.
> >
> > Maybe I shouldn't trust top/htop output on this matter but the same
> > test carried out on Microsoft XP OS shows that it indeed
> > distributes the load correctly, running tasks on different physical
> > cores whenever possible.
> >
> > Any thoughts? comments? I think this is quite a serious problem.
>
> If sched_mc is set to zero then this looks like a serious load
> balancing bug - you are perfectly right that we should balance
> between physical packages first and ending up with the kind of
> asymmetry you describe for any observable length is a bug.
>
> You have not outlined your exact workload - do you run a simple CPU
> consuming loop with no sleeping done whatsoever, or something more
> complex?
>
> Peter, Paul, Mike, any ideas?

SD_SHARE_PKG_RESOURCES is on in the SIBLING domain, so in the sync hint
wakeup case (given no other tasks running to muddy the water), the hint
allows us to do an affine wakeup, which allows select_idle_sibling() to
convert the CPU affine wakeup into a cache affine wakeup, waker will be
on one CPU, wakee on it's sibling. Turning SD_SHARE_PKG_RESOURCES off
results in sync wakeup pairs landing CPU affine instead.

!sync wakeups spread to separate cores unless threads exceeds cores.

I just tested massive_intr (!sync) and tbench pairs (sync) on E5620 box,
and that's what I see happening.

sync wakee landing on a idle sibling is neither black nor white... more
of a London fog + L.A. smog.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/