Re: [patch 0/7] sched: change nohz idle load balancing logic topush model

From: Peter Zijlstra
Date: Thu May 20 2010 - 06:50:59 EST

On Mon, 2010-05-17 at 11:27 -0700, Suresh Siddha wrote:
> This is an updated version of patchset which is posted earlier at
> Description:
> Existing nohz idle load balance logic uses the pull model, with one
> idle load balancer CPU nominated on any partially idle system and that
> balancer CPU not going into nohz mode. With the periodic tick, the
> balancer does the idle balancing on behalf of all the CPUs in nohz mode.
> This is not very optimal and has few issues:
> * the balancer will continue to have periodic ticks and wakeup
> frequently (HZ rate), even though it may not have any rebalancing to do on
> behalf of any of the idle CPUs.
> * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic
> wakeup can result in an additional interrupt on a CPU doing the timer
> broadcast.
> The alternative is to have a push model, where all idle CPUs can enter nohz
> mode and any busy CPU kicks one of the idle CPUs to take care of idle
> balancing on behalf of a group of idle CPUs.
> Following patches switches idle load balancer to this push approach.
> Updates from the previous version:
> * Busy CPU uses send_remote_softirq() for invoking SCHED_SOFTIRQ on the
> idle load balancing cpu, which does the load balancing on behalf of
> all the idle CPUs.
> * Dropped the per NUMA node nohz load balancing as it doesn't detect
> certain imbalance scenarios. This will be addressed later.

Looks good.

I think we want to keep init_remote_softirq_csd() and a function that
directly triggers the relevant softirq and make the networking code and
the block layer use that if possible -- and axe the rest of the
send_remote_softirq() infrastructure.

Also, I think it makes sense to fold patches 4-6.

