Re: [PATCH] sched: Calculate effective load even if local weight is 0

From: Mel Gorman
Date: Fri Jan 17 2014 - 09:56:37 EST

On Mon, Jan 13, 2014 at 01:22:40PM +0530, Preeti Murthy wrote:
> Hi,
> On Mon, Jan 6, 2014 at 5:09 PM, Mel Gorman <mgorman@xxxxxxx> wrote:
> > (Rik, you authored this patch so it should be sent from you and needs a
> > signed-off assuming people are ok with the changelog.)
> >
> > Thomas Hellstrom bisected a regression where erratic 3D performance is
> > experienced on virtual machines as measured by glxgears. It identified
> > commit 58d081b5 (sched/numa: Avoid overloading CPUs on a preferred NUMA
> > node) as the problem which had modified the behaviour of effective_load.
> >
> > Effective load calculates the difference to the system-wide load if a
> > scheduling entity was moved to another CPU. The task group is not heavier
> > as a result of the move but overall system load can increase/decrease as a
> > result of the change. Commit 58d081b5 (sched/numa: Avoid overloading CPUs
> > on a preferred NUMA node) changed effective_load to make it suitable for
> > calculating if a particular NUMA node was compute overloaded. To reduce
> > the cost of the function, it assumed that a current sched entity weight
> > of 0 was uninteresting but that is not the case.
> >
> > wake_affine uses a weight of 0 for sync wakeups on the grounds that it
> > is assuming the waking task will sleep and not contribute to load in the
> > near future. In this case, we still want to calculate the effective load
> > of the sched entity hierarchy. As effective_load is no longer used by
> Would it be worth mentioning that besides sync wakeups, wake_affine() uses a
> weight of 0 for the sched entity, for effective load calculation on
> the prev_cpu as well?
> This is so as to find the effect of moving this task away from the
> prev_cpu. Here
> too we are interested in calculating the effective load of the root
> task group of this
> sched entity on the prev_cpu and the below restored check will be relevant.
> Without the below check the difference in the loads of the wake affine
> CPU and the
> prev_cpu can get messed up.

I was too slow getting to this mail unfortunately. The patch is already
merged upstream with the changelog as-is.

Mel Gorman
