Re: [sched, patch] better wake-balancing, #3

From: Ingo Molnar
Date: Sat Jul 30 2005 - 02:20:32 EST

* Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

> > here's an updated patch. It handles one more detail: on SCHED_SMT we
> > should check the idleness of siblings too. Benchmark numbers still
> > look good.
> Maybe. Ken hasn't measured the effect of wake balancing in 2.6.13,
> which is quite a lot different to that found in 2.6.12.
> I don't really like having a hard cutoff like that -wake balancing can
> be important for IO workloads, though I haven't measured for a long
> time. [...]

well, i have measured it, and it was a win for just about everything
that is not idle, and even for an IPC (SysV semaphores) half-idle
workload i've measured a 3% gain. No performance loss in tbench either,
which is clearly the most sensitive to affine/passive balancing. But i'd
like to see what Ken's (and others') numbers are.

the hard cutoff also has the benefit that it allows us to potentially
make wakeup migration _more_ agressive in the future. So instead of
having to think about weakening it due to the tradeoffs present in e.g.
Ken's workload, we can actually make it stronger.

> [...] In IPC workloads, the cache affinity of local wakeups becomes
> less apparent when the runqueue gets lots of tasks on it, however
> benefits of IO affinity will generally remain. Especially on NUMA
> systems.

especially on NUMA, if the migration-target CPU (this_cpu) is not at
least partially idle, i'd be quite uneasy to passive balance from
another node. I suspect this needs numbers from Martin and John?

> fork/clone/exec/etc balancing really doesn't do anything to capture
> this kind of relationship between tasks and between tasks and IRQ
> sources. Without wake balancing we basically have a completely random
> scattering of tasks.

Ken's workload is a heavy IO one with lots of IRQ sources. And precisely
for such type of workloads usually the best tactic is to leave the task
alone and queue it wherever it last ran.

whenever there's a strong (and exclusive) relationship between tasks and
individual interrupt sources, explicit binding to CPUs/groups of CPUs is
the best method. In any case, more measurements are needed.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at