Re: linux-next: Tree for April 14 (Call-traces: RCU/ACPI/WQ related?)

From: Sedat Dilek
Date: Thu Apr 21 2011 - 06:25:06 EST


On Thu, Apr 21, 2011 at 11:07 AM, Sedat Dilek
<sedat.dilek@xxxxxxxxxxxxxx> wrote:
> On Thu, Apr 21, 2011 at 7:08 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>> On Thu, Apr 14, 2011 at 03:44:11PM -0700, Paul E. McKenney wrote:
>>> On Fri, Apr 15, 2011 at 12:19:34AM +0200, Sedat Dilek wrote:
>>> > On Thu, Apr 14, 2011 at 12:19 PM, Sedat Dilek
>>> > <sedat.dilek@xxxxxxxxxxxxxx> wrote:
>>> > > On Thu, Apr 14, 2011 at 11:16 AM, Sedat Dilek
>>> > > <sedat.dilek@xxxxxxxxxxxxxx> wrote:
>>> > >> [ Adding CC to RCU maintainer (Hi Paul :-)) ]
>>> > >>
>>> > >> Helping me for now with (see also Documentation/RCU/stallwarn.txt):
>>> > >>
>>> > >> # cat /sys/module/rcutree/parameters/rcu_cpu_stall_suppress
>>> > >> 0
>>> > >>
>>> > >> # echo "1" > /sys/module/rcutree/parameters/rcu_cpu_stall_suppress
>>> > >>
>>> > >> # cat /sys/module/rcutree/parameters/rcu_cpu_stall_suppress
>>> > >> 1
>>> > >>
>>> > >> - Sedat -
>>> > >>
>>> > >
>>> > > That workaround helped till a system-freeze when generating a tarball
>>> > > from my current kernel-tree.
>>> > > I switched back to my yesterday's linux-next kernel.
>>> > >
>>> > > - Sedat -
>>> > >
>>> >
>>> > I isolated the culprit so far:
>>> >
>>> > commit 900507fc62d5ba0164c07878dbc36ac97866a858
>>> > "rcu: move TREE_RCU from softirq to kthread"
>>> >
>>> > With this revert my system does not show the symptoms I have reported.
>>>
>>> Hmmm... ÂI never was able to reproduce this, but did find a workload
>>> that slowed up the grace periods. ÂI fixed that (which turned out to
>>> be a wakeup problem), but my hopes that it would also fix your problem
>>> were clearly unfounded. ÂI have once again stopped exporting this commit
>>> to -next.
>>
>> I have added some debug tracing, which are available at branch
>> "sedat.2011.04.19a" in the git repository at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
>>
>> Alternatively, if it is easier, the shown below can be used. ÂFWIW,
>> this patch is against 2.6.39-rc3.
>>
>> Either way, if you get a chance to run your tests on this, could you
>> please run the attached script (collectdebugfs.sh) and capture its output?
>> Sample output is attached as well (collectdebugfs.sh.out): Âthe script
>> should output something vaguely like the sample output every 15 seconds
>> or so.
>>
>> The script assumes that debugfs is enabled (along with CONFIG_RCU_TRACE=y)
>> and mounted as follows:
>>
>> Â Â Â Âmount -t debugfs none /sys/kernel/debug/
>>
>> Or if you mount debugfs somewhere else, please set the script's DEBUGFS_MP
>> variable accordingly.
>>
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>>
>> ------------------------------------------------------------------------
>>
>
> Welcome to operation "Kill that RCU brainbug" (Starship troopers part X)!
>
> Of course I can help with testing.
>
> Paul, did you see recent RCU-related fixes to fs between rc3 and rc4?
>
> commit c1530019e311c91d14b24d8e74d233152d806e45
> vfs: Fix absolute RCU path walk failures due to uninitialized seq number
>
> fff3e5ade4455a4b42a19c95dd7a167a3cb7956a
> fs: synchronize_rcu when unregister_filesystem success not failure
>
> IIRC, Jens has pending block/plugging patches in his for-linus tree.
> Especially this one (CONFIG_PREEMPT):
>
> 5f45c69589b7d2953584e6cd0b31e35dbe960ad0
> cfq-iosched: read_lock() does not always imply rcu_read_lock()
>
> Some questions to test-scenario:
>
> Shall I test from linux-2.6-rcu.git#sedat.2011.04.19a GIT tree?
> I think that's the ideal solution.
> Or shall I pull sedat.2011.04.19a GIT branch into "BROKEN" linux-next
> (next-20110414)?
>
> Again, with which RCU/HZ/PREEMPT kernel-config options shall I test?
> This is from my yesterday's linux-next:
>
> # egrep 'RCU|_HZ |PREEMPT' /boot/config-2.6.39-rc4-next20110420.4-686-small
> # RCU Subsystem
> CONFIG_TREE_RCU=y
> # CONFIG_PREEMPT_RCU is not set
> CONFIG_RCU_TRACE=y
> CONFIG_RCU_FANOUT=32
> # CONFIG_RCU_FANOUT_EXACT is not set
> CONFIG_RCU_FAST_NO_HZ=y
> CONFIG_TREE_RCU_TRACE=y
> # CONFIG_PREEMPT_NONE is not set
> CONFIG_PREEMPT_VOLUNTARY=y
> # CONFIG_PREEMPT is not set
> # CONFIG_SPARSE_RCU_POINTER is not set
> CONFIG_RCU_TORTURE_TEST=m
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
>
> Regards,
> - Sedat -
>

Looks like you want me to test with RCU_BOOST and RCU_TORTURE_TEST :-).

Attached is collectdebugfs-dileks.log, my current kernel-config and a
build-script to generate Debian packages.

$ LANG=C ./collectdebugfs.sh 2>&1 | tee collectdebugfs-dileks.log

I will do a 2nd run with PREEMPT_RCU enabled.

- Sedat -
Thu Apr 21 12:19:34 CEST 2011
rcu_sched_state: completed=12056 gpnum=12057 age=150928 max=56
rcu_bh_state: completed=-298 gpnum=4294966998 age=0 max=1
rcu_sched:
c=12056 g=12057 s=3 jfq=3 j=7aeb nfqs=625021/nfqsng=0(625021) fqlh=0
1/1 ..>. 0:31 ^0
1/1 ..>. 0:15 ^0 0/0 ..>. 16:31 ^1
rcu_bh:
c=4294966998 g=4294966998 s=0 jfq=-167826 j=7aeb nfqs=0/nfqsng=0(0) fqlh=0
0/1 ..>. 0:31 ^0
0/1 ..>. 0:15 ^0 0/0 ..>. 16:31 ^1
rcu_sched:
0 c=12056 g=12057 pq=1 pqc=12056 qp=1 dt=50197/1/0 df=0 of=0 ri=619195 ql=317864 qs=N.W. kt=1/W b=2147483647 ci=209844 co=0 ca=0
rcu_bh:
0 c=4294966998 g=4294966998 pq=1 pqc=4294966997 qp=0 dt=50197/1/0 df=0 of=0 ri=0 ql=0 qs=.... kt=1/W b=10 ci=6 co=0 ca=0
rcu_sched:
0 np=161772 qsp=9 rpq=157538 cbr=2 cng=1381 gpc=839 gps=0 nf=0 nn=2012
rcu_bh:
0 np=2012 qsp=0 rpq=238 cbr=0 cng=0 gpc=0 gps=0 nf=0 nn=1774
rcutorture test sequence: 0
rcutorture update version number: 0
cat: /sys/kernel/debug/rcu/rcuboost: No such file or directory
no rcuboost

Attachment: config-2.6.39-rc3-rcu-sedat.2011.04.19a+
Description: Binary data

Attachment: build_linux-2.6-rcu.sh
Description: Bourne shell script