Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

From: Luiz Capitulino
Date: Fri Aug 11 2017 - 15:10:19 EST


On Fri, 21 Jul 2017 15:21:28 +0200
Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:

> The housekeeping is currently driven by nohz_full where any CPU that
> is not in the nohz_full range is considered as a housekeeper. This is
> a design mistake because nohz is just a detail among all the existing
> isolation features. Nohz shouldn't imply anything else than tick related
> things.
>
> We rather want to drive all the isolation features from the housekeeping
> subsystem which is responsible for all the work that can be either
> affined (unpinned workqueues, timers, kthreads, ...) or offloaded
> (scheduler tick, ...).

That makes a lot of sense. I think this is moving in the right
direction. I have a comment below though.

>
> Let's start with a boot option to define the housekeepers. We should be
> able to further enhance that through cpusets.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Chris Metcalf <cmetcalf@xxxxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Mike Galbraith <efault@xxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Christoph Lameter <cl@xxxxxxxxx>
> Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Cc: Wanpeng Li <kernellwp@xxxxxxxxx>
> Cc: Luiz Capitulino <lcapitulino@xxxxxxxxxx>
> ---
> include/linux/housekeeping.h | 2 --
> init/main.c | 2 --
> kernel/housekeeping.c | 22 ++++++++++------------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/housekeeping.h b/include/linux/housekeeping.h
> index 320cc2b..ba769c8 100644
> --- a/include/linux/housekeeping.h
> +++ b/include/linux/housekeeping.h
> @@ -11,7 +11,6 @@ extern int housekeeping_any_cpu(void);
> extern const struct cpumask *housekeeping_cpumask(void);
> extern void housekeeping_affine(struct task_struct *t);
> extern bool housekeeping_test_cpu(int cpu);
> -extern void __init housekeeping_init(void);
>
> #else
>
> @@ -26,7 +25,6 @@ static inline const struct cpumask *housekeeping_cpumask(void)
> }
>
> static inline void housekeeping_affine(struct task_struct *t) { }
> -static inline void housekeeping_init(void) { }
> #endif /* CONFIG_NO_HZ_FULL */
>
> static inline bool housekeeping_cpu(int cpu)
> diff --git a/init/main.c b/init/main.c
> index 9904a1e..9789ab7 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -46,7 +46,6 @@
> #include <linux/cgroup.h>
> #include <linux/efi.h>
> #include <linux/tick.h>
> -#include <linux/housekeeping.h>
> #include <linux/interrupt.h>
> #include <linux/taskstats_kern.h>
> #include <linux/delayacct.h>
> @@ -608,7 +607,6 @@ asmlinkage __visible void __init start_kernel(void)
> early_irq_init();
> init_IRQ();
> tick_init();
> - housekeeping_init();
> rcu_init_nohz();
> init_timers();
> hrtimers_init();
> diff --git a/kernel/housekeeping.c b/kernel/housekeeping.c
> index f8be7e6..a54765d 100644
> --- a/kernel/housekeeping.c
> +++ b/kernel/housekeeping.c
> @@ -45,23 +45,21 @@ bool housekeeping_test_cpu(int cpu)
> return true;
> }
>
> -void __init housekeeping_init(void)
> +/* Parse the boot-time housekeeping CPU list from the kernel parameters. */
> +static int __init housekeeping_setup(char *str)
> {
> - if (!tick_nohz_full_enabled())
> - return;
> -
> - if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
> - WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
> - cpumask_clear(tick_nohz_full_mask);
> - tick_nohz_full_running = false;
> - return;
> + alloc_bootmem_cpumask_var(&housekeeping_mask);
> + if (cpulist_parse(str, housekeeping_mask) < 0) {
> + pr_warn("Housekeeping: Incorrect cpumask\n");
> + free_bootmem_cpumask_var(housekeeping_mask);
> + return 1;
> }
>
> - cpumask_andnot(housekeeping_mask,
> - cpu_possible_mask, tick_nohz_full_mask);
> -
> static_branch_enable(&housekeeping_overriden);
>
> /* We need at least one CPU to handle housekeeping work */
> WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
> +
> + return 1;
> }
> +__setup("housekeeping=", housekeeping_setup);

Am I right that from now on nohz_full= users will also have
to specify housekeeping= in order to get nohz_full working?
If that's correct, then won't this patch break nohz_full for
existing setups?

Also, I just give this series a try and got this:

[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.13.0-rc4+ root=/dev/mapper/rhel_virtlab508-root ro crashkernel=auto rd.lvm.lv=rhel_virtlab508/root rd.lvm.lv=rhel_virtlab508/swap console=ttyS1,115200 LANG=en_US.UTF-8 housekeeping=0,2,4,6,8,10,12,14,1 isolcpus=15 nohz_full=15 intel_pstate=disable
[ 0.000000] static_key_slow_inc used before call to jump_label_init
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:108 static_key_slow_inc+0x86/0xa0
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.13.0-rc4+ #2
[ 0.000000] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015
[ 0.000000] task: ffffffffb6010480 task.stack: ffffffffb6000000
[ 0.000000] RIP: 0010:static_key_slow_inc+0x86/0xa0
[ 0.000000] RSP: 0000:ffffffffb6003d98 EFLAGS: 00010046 ORIG_RAX: 0000000000000000
[ 0.000000] RAX: 0000000000000037 RBX: ffffffffb66aa780 RCX: ffffffffb6061308
[ 0.000000] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000002
[ 0.000000] RBP: ffffffffb6003da0 R08: 6b5f636974617473 R09: 00000000000001e4
[ 0.000000] R10: 776f6c735f79656b R11: 0000000000000000 R12: ffff972c3ffd1cfe
[ 0.000000] R13: ffffffffffffffff R14: 0000000000000000 R15: 000000000000000d
[ 0.000000] FS: 0000000000000000(0000) GS:ffff97282ea00000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.000000] CR2: ffff972905974000 CR3: 0000000545209000 CR4: 00000000000406b0
[ 0.000000] Call Trace:
[ 0.000000] static_key_enable+0x1d/0x30
[ 0.000000] housekeeping_setup+0x5a/0x7e
[ 0.000000] unknown_bootoption+0x8b/0x19a
[ 0.000000] parse_args+0x224/0x3b0
[ 0.000000] ? set_init_arg+0x5a/0x5a
[ 0.000000] start_kernel+0x209/0x4cd
[ 0.000000] ? set_init_arg+0x5a/0x5a
[ 0.000000] ? early_idt_handler_array+0x120/0x120
[ 0.000000] x86_64_start_reservations+0x24/0x26
[ 0.000000] x86_64_start_kernel+0x14c/0x16f
[ 0.000000] secondary_startup_64+0x9f/0x9f