RE: [PATCH v2] rcutorture: Convert schedule_timeout_uninterruptible() to mdelay() in rcu_torture_stall()

From: Zhang, Qiang1
Date: Mon Mar 20 2023 - 20:35:33 EST


> > For kernels built with enable PREEMPT_NONE and CONFIG_DEBUG_ATOMIC_SLEEP,
> > running the RCU stall tests.
> >
> > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4"
> > bootparams="nokaslr console=ttyS0 rcutorture.stall_cpu=30
> > rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_irqsoff=1
> > rcutorture.stall_cpu_block=1" -d
> >
> > [ 10.841071] rcu-torture: rcu_torture_stall begin CPU stall
> > [ 10.841073] rcu_torture_stall start on CPU 3.
> > [ 10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000
> > ....
> > [ 10.841108] Call Trace:
> > [ 10.841110] <TASK>
> > [ 10.841112] dump_stack_lvl+0x64/0xb0
> > [ 10.841118] dump_stack+0x10/0x20
> > [ 10.841121] __schedule_bug+0x8b/0xb0
> > [ 10.841126] __schedule+0x2172/0x2940
> > [ 10.841157] schedule+0x9b/0x150
> > [ 10.841160] schedule_timeout+0x2e8/0x4f0
> > [ 10.841192] schedule_timeout_uninterruptible+0x47/0x50
> > [ 10.841195] rcu_torture_stall+0x2e8/0x300
> > [ 10.841199] kthread+0x175/0x1a0
> > [ 10.841206] ret_from_fork+0x2c/0x50
> >
> > The above calltrace occurs in the local_irq_disable/enable() critical
> > section call schedule_timeout(), and invoke schedule_timeout() also
> > implies a quiescent state, of course it also fails to trigger RCU stall,
> > this commit therefore use mdelay() instead of schedule_timeout() to
> > trigger RCU stall.
> >
> > Suggested-by: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> > ---
> > kernel/rcu/rcutorture.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index d06c2da04c34..a08a72bef5f1 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -2472,7 +2472,7 @@ static int rcu_torture_stall(void *args)
> >
> >Right here there is:
> >
> > if (stall_cpu_block) {
> >
> >In other words, the rcutorture.stall_cpu_block module parameter says to
> >block, even if it is a bad thing to do. The point of this is to verify
> >the error messages that are supposed to be printed on the console when
> >this happens.
> >
> > #ifdef CONFIG_PREEMPTION
> > preempt_schedule();
> > #else
> > - schedule_timeout_uninterruptible(HZ);
> > + mdelay(jiffies_to_msecs(HZ));
> >
> >So this really needs to stay schedule_timeout_uninterruptible(HZ).
>
> But invoke schedule_timeout_uninterruptible(HZ) implies a quiescent state,
> this will not cause an RCU stall to occur, and still in the RCU read critical section(PREEMPT_COUNT=y).
>
> It didn't happen RCU stall when I tested with the following parameters for
> rcutorture.stall_cpu=30
> rcutorture.stall_no_softlockup=1
> rcutorture.stall_cpu_irqsoff=1
> rcutorture.stall_cpu_block=1
>
>Understood. If you want that RCU CPU stall in a CONFIG_PREEMPTION=n
>kernel, you should not use rcutorture.stall_cpu_block=1.
>
>In a CONFIG_PREEMPTION=y kernel, rcutorture.stall_cpu_block=1 forces
>the grace period to be stalled on a task rather than a CPU, exercising
>a different part of the RCU CPU stall warning code.
>
>In a CONFIG_PREEMPTION=n kernel, using rcutorture.stall_cpu_block=1
>forces the CPU to go through a quiescent state, as you say. It can
>also cause lockdep and scheduling-while-atomic complaints, depending on
>exactly what type of RCU reader is in effect.
>
>So these are test-the-diagnostics parameters. The mdelay() instead
>makes rcutorture.stall_cpu_block=1 do the same thing as does
>rcutorture.stall_cpu_block=0 for CONFIG_PREEMPTION=n kernels, right?

Yes, maybe we can increase the description of the stall_cpu_block in kernel-parameters.txt.


>
> Thanx, Paul
>
> Thanks
> Zqiang
>
> >
> >So should there be a change to kernel-parameters.txt to make it
> >more clear that this is intended behavior?

Agree

Thanks
Zqiang

> >
> > Thanx, Paul
> >
> > #endif
> > } else if (stall_no_softlockup) {
> > touch_softlockup_watchdog();
> > --
> > 2.25.1
> >