Re: [tip: sched/core] sched/core: Initialize the idle task with preemption disabled

From: Frederic Weisbecker
Date: Wed Jul 07 2021 - 08:03:10 EST


On Wed, Jul 07, 2021 at 12:55:20AM +0100, Valentin Schneider wrote:
>
> Hi Guenter,
>
> On 06/07/21 12:44, Guenter Roeck wrote:
> > This patch results in several messages similar to the following
> > when booting s390 images in qemu.
> >
> > [ 1.690807] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
> > [ 1.690925] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
> > [ 1.691053] no locks held by swapper/0/1.
> > [ 1.691310] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-11788-g79160a603bdb #1
> > [ 1.691469] Hardware name: QEMU 2964 QEMU (KVM/Linux)
> > [ 1.691612] Call Trace:
> > [ 1.691718] [<0000000000d98bb0>] show_stack+0x90/0xf8
> > [ 1.692040] [<0000000000da894c>] dump_stack_lvl+0x74/0xa8
> > [ 1.692134] [<0000000000187e52>] ___might_sleep+0x15a/0x170
> > [ 1.692228] [<000000000014f588>] cpus_read_lock+0x38/0xc0
> > [ 1.692320] [<0000000000182e8a>] smpboot_register_percpu_thread+0x2a/0x160
> > [ 1.692412] [<00000000014814b8>] cpuhp_threads_init+0x28/0x60
> > [ 1.692505] [<0000000001487a30>] smp_init+0x28/0x90
> > [ 1.692597] [<00000000014779a6>] kernel_init_freeable+0x1f6/0x270
> > [ 1.692689] [<0000000000db7466>] kernel_init+0x2e/0x160
> > [ 1.692779] [<0000000000103618>] __ret_from_fork+0x40/0x58
> > [ 1.692870] [<0000000000dc6e12>] ret_from_fork+0xa/0x30
> >
> > Reverting this patch fixes the problem.
> > Bisect log is attached.
> >
> > Guenter
> >
>
> Thanks for the report.
>
> So somehow the init task ends up with a non-zero preempt_count()? Per
> FORK_PREEMPT_COUNT we should exit __ret_from_fork() with a zero count, are
> you hitting the WARN_ONCE() in finish_task_switch()?
>
> Does CONFIG_DEBUG_PREEMPT=y yield anything interesting?
>
> I can't make sense of this right now, but it's a bit late :) I'll grab some
> toolchain+qemu tomorrow and go poke at it (and while at it I need to do the
> same with powerpc).

One possible issue is that s390's init_idle_preempt_count() doesn't apply on the
target idle task but on the _current_ CPU. And since smp_init() ->
idle_threads_init() is actually called remotely, we are overwriting the current
CPU preempt_count() instead of the target one.