Re: [linus:master] [maple_tree] 120b116208: INFO:task_blocked_for_more_than#seconds

From: Paul E. McKenney
Date: Tue Jan 31 2023 - 15:52:29 EST


On Tue, Jan 31, 2023 at 03:45:20PM -0500, Liam R. Howlett wrote:
> * Paul E. McKenney <paulmck@xxxxxxxxxx> [230131 15:26]:
> > On Tue, Jan 31, 2023 at 03:18:22PM +0800, kernel test robot wrote:
> > > Hi Liam,
> > >
> > > We caught a "task blocked" dmesg in maple tree test. Not sure if this
> > > is expected for maple tree test, so we are sending this report for
> > > your information. Thanks.
> > >
> > > Greeting,
> > >
> > > FYI, we noticed INFO:task_blocked_for_more_than#seconds due to commit (built with clang-14):
> > >
> > > commit: 120b116208a0877227fc82e3f0df81e7a3ed4ab1 ("maple_tree: reorganize testing to restore module testing")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > in testcase: boot
> > >
> > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > >
> > >
> > > [ 17.318428][ T1] calling maple_tree_seed+0x0/0x15d0 @ 1
> > > [ 17.319219][ T1]
> > > [ 17.319219][ T1] TEST STARTING
> > > [ 17.319219][ T1]
> > > [ 999.249871][ T23] INFO: task rcu_scale_shutd:59 blocked for more than 491 seconds.
> > > [ 999.253363][ T23] Not tainted 6.1.0-rc4-00003-g120b116208a0 #1
> > > [ 999.254249][ T23] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [ 999.255390][ T23] task:rcu_scale_shutd state:D stack:30968 pid:59 ppid:2 flags:0x00004000
> > > [ 999.256934][ T23] Call Trace:
> > > [ 999.257418][ T23] <TASK>
> > > [ 999.257900][ T23] __schedule+0x169b/0x1f90
> > > [ 999.261677][ T23] schedule+0x151/0x300
> > > [ 999.262281][ T23] ? compute_real+0xe0/0xe0
> > > [ 999.263364][ T23] rcu_scale_shutdown+0xdd/0x130
> > > [ 999.264093][ T23] ? wake_bit_function+0x2c0/0x2c0
> > > [ 999.268985][ T23] kthread+0x309/0x3a0
> > > [ 999.269958][ T23] ? compute_real+0xe0/0xe0
> > > [ 999.270552][ T23] ? kthread_unuse_mm+0x200/0x200
> > > [ 999.271281][ T23] ret_from_fork+0x1f/0x30
> > > [ 999.272385][ T23] </TASK>
> > > [ 999.272865][ T23]
> > > [ 999.272865][ T23] Showing all locks held in the system:
> > > [ 999.273988][ T23] 2 locks held by swapper/0/1:
> > > [ 999.274684][ T23] 1 lock held by khungtaskd/23:
> > > [ 999.275400][ T23] #0: ffffffff88346e00 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x8/0x30
> > > [ 999.277171][ T23]
> > > [ 999.277525][ T23] =============================================
> > > [ 999.277525][ T23]
> > > [ 1049.050884][ T1] maple_tree: 12610686 of 12610686 tests passed
> > >
> > >
> > > If you fix the issue, kindly add following tag
> > > | Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> > > | Link: https://lore.kernel.org/oe-lkp/202301310940.4a37c7af-yujie.liu@xxxxxxxxx
> >
> > Liam brought this to my attention on IRC, and it looks like the root
> > cause is that the rcuscale code does not deal gracefully with grace
> > periods that are in much excess of a second in duration.
> >
> > Now, it might well be worth looking into why the grace periods were taking
> > that long, but if you were running Maple Tree stress tests concurrently
> > with rcuscale, this might well be expected behavior.
> >
>
> This could be simply cpu starvation causing no foward progress in your
> tests with the number of concurrent running tests and "-smp 2".
>
> It's also worth noting that building in the rcu test module makes the
> machine turn off once the test is complete. This can be seen in your
> console message:
> [ 13.254240][ T1] rcu-scale:--- Start of test: nreaders=2 nwriters=2 verbose=1 shutdown=1
>
> so your machine may not have finished running through the array of tests
> you have specified to build in - which is a lot. I'm not sure if this
> is the best approach considering the load that produces on the system
> and how difficult it is (was) to figure out which test is causing a
> stall, or other issue.

Agreed, both rcuscale and refscale when built in turn the machine off at
the end of the test. For providing background stress for some other test
(in this case Maple Tree tests), rcutorture, locktorture, or scftorture
might be better choices.

Thanx, Paul