Re: [PATCH] nohz: Revert "nohz: Set isolcpus when nohz_full is set"

From: Chris Metcalf
Date: Mon Oct 12 2015 - 12:55:54 EST


On 10/12/2015 12:53 PM, Paul E. McKenney wrote:
On Mon, Oct 12, 2015 at 06:20:03PM +0200, Frederic Weisbecker wrote:
On Mon, Oct 12, 2015 at 08:32:02AM -0700, Paul E. McKenney wrote:
On Mon, Oct 12, 2015 at 05:21:23PM +0200, Frederic Weisbecker wrote:
This reverts commit 8cb9764fc88b41db11f251e8b2a0d006578b7eb4.

We assumed that nohz full users always want scheduler isolation on full
dynticks CPUs, therefore we included nohz full CPUs on cpu_isolated_map.
This means that tasks run by default on CPUs outside the nohz_full range
unless their affinity is explicity overwritten.

This suits pure isolation workloads but when the machine is needed to
run common workloads, the available sets of CPUs to run common tasks
becomes reduced.

We reach an extreme case when CONFIG_NO_HZ_FULL_ALL is enabled as it
leaves only CPU 0 for non-isolation tasks, which makes people think that
their supercomputer regressed to 90's UP.

Some nohz full users appear to be interested in running normal workloads
either before or after an isolation workload. Nohz full isn't optimized
toward normal workloads but it's still better than UP performance.

We are reaching a limitation in kernel presets here. Lets revert this
cpu_isolated_map inclusion and let userspace do its own scheduler
isolation using cpusets or explicit affinity settings.

Reported-by: Ingo Molnar <mingo@xxxxxxxxxx>
Reported-by: Mike Galbraith <umgwanakikbuti@xxxxxxxxx>
Cc: Chris Metcalf <cmetcalf@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxx>
Cc: Mike Galbraith <umgwanakikbuti@xxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Dave Jones <davej@xxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Alexey Dobriyan <adobriyan@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
---
kernel/sched/core.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6159531..3c35b5f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7238,9 +7238,6 @@ void __init sched_init_smp(void)
alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
alloc_cpumask_var(&fallback_doms, GFP_KERNEL);

- /* nohz_full won't take effect without isolating the cpus. */
- tick_nohz_full_add_cpus_to(cpu_isolated_map);
-
Why not make this controlled by a boot parameter? That preserves
the ease of use for those needing it, but avoids problems from people
doing "make randconfig".
Well it is already. As you pass nohz_full=1-32, you can pass as well isolcpus=1-32
True enough. Not sure that having to repeat the CPU list twice qualifies as
"easy to use", though. Why not a nohz_full_iso or some such that isolates
whatever CPUs you specified?

Is it worth starting to think about grouping things under the
"task isolation" model somehow? "task_isolation_cpus=1-31"
or some such for this, and then that just sets up the nohz_full
and isolcpus options under the hood?

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/