Re: [tip: sched/core] sched/core: Initialize the idle task with preemption disabled

From: Guenter Roeck
Date: Tue Jul 06 2021 - 15:45:01 EST


Hi,

On Wed, May 12, 2021 at 11:15:54AM -0000, tip-bot2 for Valentin Schneider wrote:
> The following commit has been merged into the sched/core branch of tip:
>
> Commit-ID: f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674
> Gitweb: https://git.kernel.org/tip/f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674
> Author: Valentin Schneider <valentin.schneider@xxxxxxx>
> AuthorDate: Wed, 12 May 2021 10:46:36 +01:00
> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> CommitterDate: Wed, 12 May 2021 13:01:45 +02:00
>
> sched/core: Initialize the idle task with preemption disabled
>
> As pointed out by commit
>
> de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread")
>
> init_idle() can and will be invoked more than once on the same idle
> task. At boot time, it is invoked for the boot CPU thread by
> sched_init(). Then smp_init() creates the threads for all the secondary
> CPUs and invokes init_idle() on them.
>
> As the hotplug machinery brings the secondaries to life, it will issue
> calls to idle_thread_get(), which itself invokes init_idle() yet again.
> In this case it's invoked twice more per secondary: at _cpu_up(), and at
> bringup_cpu().
>
> Given smp_init() already initializes the idle tasks for all *possible*
> CPUs, no further initialization should be required. Now, removing
> init_idle() from idle_thread_get() exposes some interesting expectations
> with regards to the idle task's preempt_count: the secondary startup always
> issues a preempt_disable(), requiring some reset of the preempt count to 0
> between hot-unplug and hotplug, which is currently served by
> idle_thread_get() -> idle_init().
>
> Given the idle task is supposed to have preemption disabled once and never
> see it re-enabled, it seems that what we actually want is to initialize its
> preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
> init_idle() from idle_thread_get().
>
> Secondary startups were patched via coccinelle:
>
> @begone@
> @@
>
> -preempt_disable();
> ...
> cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
>
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Acked-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@xxxxxxx

This patch results in several messages similar to the following
when booting s390 images in qemu.

[ 1.690807] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[ 1.690925] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
[ 1.691053] no locks held by swapper/0/1.
[ 1.691310] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-11788-g79160a603bdb #1
[ 1.691469] Hardware name: QEMU 2964 QEMU (KVM/Linux)
[ 1.691612] Call Trace:
[ 1.691718] [<0000000000d98bb0>] show_stack+0x90/0xf8
[ 1.692040] [<0000000000da894c>] dump_stack_lvl+0x74/0xa8
[ 1.692134] [<0000000000187e52>] ___might_sleep+0x15a/0x170
[ 1.692228] [<000000000014f588>] cpus_read_lock+0x38/0xc0
[ 1.692320] [<0000000000182e8a>] smpboot_register_percpu_thread+0x2a/0x160
[ 1.692412] [<00000000014814b8>] cpuhp_threads_init+0x28/0x60
[ 1.692505] [<0000000001487a30>] smp_init+0x28/0x90
[ 1.692597] [<00000000014779a6>] kernel_init_freeable+0x1f6/0x270
[ 1.692689] [<0000000000db7466>] kernel_init+0x2e/0x160
[ 1.692779] [<0000000000103618>] __ret_from_fork+0x40/0x58
[ 1.692870] [<0000000000dc6e12>] ret_from_fork+0xa/0x30

Reverting this patch fixes the problem.
Bisect log is attached.

Guenter

---
# bad: [007b350a58754a93ca9fe50c498cc27780171153] Merge tag 'dlm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
# good: [62fb9874f5da54fdb243003b386128037319b219] Linux 5.13
git bisect start '007b350a5875' '62fb9874f5da'
# bad: [36824f198c621cebeb22966b5e244378fa341295] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect bad 36824f198c621cebeb22966b5e244378fa341295
# bad: [9269d27e519ae9a89be8d288f59d1ec573b0c686] Merge tag 'timers-nohz-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 9269d27e519ae9a89be8d288f59d1ec573b0c686
# good: [69609a91ac1d82f9c958a762614edfe0ac8498e3] Merge tag 'spi-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect good 69609a91ac1d82f9c958a762614edfe0ac8498e3
# good: [a15286c63d113d4296c58867994cd266a28f5d6d] Merge tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good a15286c63d113d4296c58867994cd266a28f5d6d
# bad: [0159bb020ca9a43b17aa9149f1199643c1d49426] Documentation: Add usecases, design and interface for core scheduling
git bisect bad 0159bb020ca9a43b17aa9149f1199643c1d49426
# good: [97886d9dcd86820bdbc1fa73b455982809cbc8c2] sched: Migration changes for core scheduling
git bisect good 97886d9dcd86820bdbc1fa73b455982809cbc8c2
# bad: [fcb501704554eebfd27e3220b0540997fd2b24a8] delayacct: Document task_delayacct sysctl
git bisect bad fcb501704554eebfd27e3220b0540997fd2b24a8
# bad: [cc00c1988801dc71f63bb7bad019e85046865095] sched: Fix leftover comment typos
git bisect bad cc00c1988801dc71f63bb7bad019e85046865095
# good: [7ac592aa35a684ff1858fb9ec282886b9e3575ac] sched: prctl() core-scheduling interface
git bisect good 7ac592aa35a684ff1858fb9ec282886b9e3575ac
# bad: [f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674] sched/core: Initialize the idle task with preemption disabled
git bisect bad f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674
# good: [9f26990074931bbf797373e53104216059b300b1] kselftest: Add test for core sched prctl interface
git bisect good 9f26990074931bbf797373e53104216059b300b1
# first bad commit: [f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674] sched/core: Initialize the idle task with preemption disabled