Re: [rcu] [ INFO: suspicious RCU usage. ]

From: Krzysztof Kozlowski
Date: Tue Feb 03 2015 - 05:01:55 EST


On sob, 2015-01-31 at 18:59 -0800, Fengguang Wu wrote:
> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git revert-c418b8035fac0cc7d242e5de126cec1006a34bed-dd2b39be8eee9d175c7842c30e405a5cbe50095a

On next-20150203 I hit similar error on ARM/Exynos4412 (Trats2 board)
while suspending to RAM:

[ 30.986262] PM: Syncing filesystems ... done.
[ 30.994661] PM: Preparing system for mem sleep
[ 31.002064] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 31.008629] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 31.016325] PM: Entering mem sleep
[ 31.016338] Suspending console(s) (use no_console_suspend to debug)
[ 31.051009] random: nonblocking pool is initialized
[ 31.085811] wake enabled for irq 102
[ 31.086964] wake enabled for irq 123
[ 31.086972] wake enabled for irq 124
[ 31.090409] PM: suspend of devices complete after 59.684 msecs
[ 31.090524] CAM_ISP_CORE_1.2V: No configuration
[ 31.090534] VMEM_VDDF_3.0V: No configuration
[ 31.090543] VCC_SUB_2.0V: No configuration
[ 31.090552] VCC_SUB_1.35V: No configuration
[ 31.090562] VMEM_1.2V_AP: No configuration
[ 31.090587] MOTOR_VCC_3.0V: No configuration
[ 31.090596] LCD_VCC_3.3V: No configuration
[ 31.090605] TSP_VDD_1.8V: No configuration
[ 31.090614] TSP_AVDD_3.3V: No configuration
[ 31.090623] VMEM_VDD_2.8V: No configuration
[ 31.090631] VTF_2.8V: No configuration
[ 31.090640] VDDQ_PRE_1.8V: No configuration
[ 31.090649] VT_CAM_1.8V: No configuration
[ 31.090658] CAM_ISP_SEN_IO_1.8V: No configuration
[ 31.090667] CAM_SENSOR_CORE_1.2V: No configuration
[ 31.090677] VHSIC_1.8V: No configuration
[ 31.090685] VHSIC_1.0V: No configuration
[ 31.090694] VABB2_1.95V: No configuration
[ 31.090703] NFC_AVDD_1.8V: No configuration
[ 31.090712] VUOTG_3.0V: No configuration
[ 31.090721] VABB1_1.95V: No configuration
[ 31.090730] VMIPI_1.8V: No configuration
[ 31.090739] CAM_ISP_MIPI_1.2V: No configuration
[ 31.090747] VMIPI_1.0V: No configuration
[ 31.090756] VPLL_1.0V_AP: No configuration
[ 31.090765] VMPLL_1.0V_AP: No configuration
[ 31.090773] VCC_1.8V_IO: No configuration
[ 31.090782] VCC_2.8V_AP: No configuration
[ 31.090791] VCC_1.8V_AP: No configuration
[ 31.090800] VM1M2_1.2V_AP: No configuration
[ 31.090809] VALIVE_1.0V_AP: No configuration
[ 31.100297] PM: late suspend of devices complete after 9.445 msecs
[ 31.108891] PM: noirq suspend of devices complete after 8.577 msecs
[ 31.109052] Disabling non-boot CPUs ...
[ 31.113921]
[ 31.113925] ===============================
[ 31.113928] [ INFO: suspicious RCU usage. ]
[ 31.113935] 3.19.0-rc7-next-20150203 #1914 Not tainted
[ 31.113938] -------------------------------
[ 31.113943] kernel/sched/fair.c:4740 suspicious rcu_dereference_check() usage!
[ 31.113946]
[ 31.113946] other info that might help us debug this:
[ 31.113946]
[ 31.113952]
[ 31.113952] RCU used illegally from offline CPU!
[ 31.113952] rcu_scheduler_active = 1, debug_locks = 0
[ 31.113957] 3 locks held by swapper/1/0:
[ 31.113988] #0: ((cpu_died).wait.lock){......}, at: [<c005a114>] complete+0x14/0x44
[ 31.114012] #1: (&p->pi_lock){-.-.-.}, at: [<c004a790>] try_to_wake_up+0x28/0x300
[ 31.114035] #2: (rcu_read_lock){......}, at: [<c004f1b8>] select_task_rq_fair+0x5c/0xa04
[ 31.114038]
[ 31.114038] stack backtrace:
[ 31.114046] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150203 #1914
[ 31.114050] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 31.114076] [<c0014ce4>] (unwind_backtrace) from [<c0011c30>] (show_stack+0x10/0x14)
[ 31.114091] [<c0011c30>] (show_stack) from [<c04dc048>] (dump_stack+0x70/0xbc)
[ 31.114105] [<c04dc048>] (dump_stack) from [<c004f83c>] (select_task_rq_fair+0x6e0/0xa04)
[ 31.114118] [<c004f83c>] (select_task_rq_fair) from [<c004a83c>] (try_to_wake_up+0xd4/0x300)
[ 31.114129] [<c004a83c>] (try_to_wake_up) from [<c00598a0>] (__wake_up_common+0x4c/0x80)
[ 31.114140] [<c00598a0>] (__wake_up_common) from [<c00598e8>] (__wake_up_locked+0x14/0x1c)
[ 31.114150] [<c00598e8>] (__wake_up_locked) from [<c005a134>] (complete+0x34/0x44)
[ 31.114167] [<c005a134>] (complete) from [<c04d6ca4>] (cpu_die+0x24/0x84)
[ 31.114179] [<c04d6ca4>] (cpu_die) from [<c005a508>] (cpu_startup_entry+0x328/0x358)
[ 31.114189] [<c005a508>] (cpu_startup_entry) from [<40008784>] (0x40008784)
[ 31.114226] CPU1: shutdown
[ 31.132479] CPU2: shutdown
[ 31.146815] CPU3: shutdown
[ 31.160767] Enabling non-boot CPUs ...
[ 31.175645] CPU1 is up
[ 31.191120] CPU2 is up
[ 31.206650] CPU3 is up
[ 31.206922] s3c-i2c 13860000.i2c: slave address 0x10
[ 31.206935] s3c-i2c 13860000.i2c: bus frequency set to 390 KHz
[ 31.206952] s3c-i2c 13890000.i2c: slave address 0x10
[ 31.206962] s3c-i2c 13890000.i2c: bus frequency set to 390 KHz
[ 31.206978] s3c-i2c 138d0000.i2c: slave address 0x10
[ 31.206987] s3c-i2c 138d0000.i2c: bus frequency set to 97 KHz
[ 31.209201] PM: noirq resume of devices complete after 2.539 msecs
[ 31.212202] PM: early resume of devices complete after 2.812 msecs
[ 31.229844] Failed to resume regulators from suspend (-22)
[ 31.230915] wake disabled for irq 123
[ 31.230923] wake disabled for irq 124
[ 31.232003] wake disabled for irq 102
[ 31.259950] max77686_rtc_tm_to_data: MAX77686 RTC cannot handle the year 1970.Assume it's 2000.
[ 31.298929] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 400000Hz, actual 396825HZ div = 63)
[ 31.526729] mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
[ 31.526976] mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 52000000Hz, actual 50000000HZ div = 1)
[ 31.527207] PM: resume of devices complete after 297.352 msecs
[ 31.985665] PM: Finishing wakeup.
[ 31.988959] Restarting tasks ... done.
root@target:~#

Best regards,
Krzysztof

>
> commit dd2b39be8eee9d175c7842c30e405a5cbe50095a
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> AuthorDate: Wed Jan 28 14:42:09 2015 -0800
> Commit: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> CommitDate: Fri Jan 30 12:59:22 2015 -0800
>
> rcu: Handle outgoing CPUs on exit from idle loop
>
> This commit informs RCU of an outgoing CPU just before that CPU invokes
> arch_cpu_idle_dead() during its last pass through the idle loop (via a
> new CPU_DYING_IDLE notifier value). This change means that RCU need not
> deal with outgoing CPUs passing through the scheduler after informing
> RCU that they are no longer online. Note that removing the CPU from
> the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
> and orphaning callbacks is still done at CPU_DEAD time, the reason being
> that at CPU_DEAD time we have another CPU that can adopt them.
>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>
> +-------------------------------------+------------+------------+------------+
> | | 8c50d7da91 | dd2b39be8e | d586642522 |
> +-------------------------------------+------------+------------+------------+
> | boot_successes | 198 | 11 | 51 |
> | boot_failures | 0 | 55 | 15 |
> | INFO:suspicious_RCU_usage | 0 | 55 | 15 |
> | RCU_used_illegally_from_offline_CPU | 0 | 55 | 15 |
> | backtrace:cpu_startup_entry | 0 | 55 | 15 |
> | BUG:kernel_test_hang | 0 | 0 | 4 |
> +-------------------------------------+------------+------------+------------+
>
> [ 15.244825] numa_remove_cpu cpu 0 node 0: mask now 1
> [ 15.246713]
> [ 15.246917] ===============================
> [ 15.247424] [ INFO: suspicious RCU usage. ]
> [ 15.247964] 3.19.0-rc1-gdd2b39b #10 Not tainted
> [ 15.248531] -------------------------------
> [ 15.248586] include/trace/events/rcu.h:35 suspicious rcu_dereference_check() usage!
> [ 15.248586]
> [ 15.248586] other info that might help us debug this:
> [ 15.248586]
> [ 15.248586]
> [ 15.248586] RCU used illegally from offline CPU!
> [ 15.248586] rcu_scheduler_active = 1, debug_locks = 0
> [ 15.248586] no locks held by swapper/0/0.
> [ 15.248586]
> [ 15.248586] stack backtrace:
> [ 15.248586] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-rc1-gdd2b39b #10
> [ 15.248586] 0000000000000001 ffffffff81e03e08 ffffffff8171b89b 0000000000000000
> [ 15.248586] ffffffff81e0e580 ffffffff81e03e38 ffffffff810efec2 ffffffff81e4b140
> [ 15.248586] ffffffff81c77ba0 0000000000000002 ffffffff81e11e98 ffffffff81e03e58
> [ 15.248586] Call Trace:
> [ 15.248586] [<ffffffff8171b89b>] dump_stack+0x7f/0xa7
> [ 15.248586] [<ffffffff810efec2>] lockdep_rcu_suspicious+0x107/0x110
> [ 15.248586] [<ffffffff81111363>] trace_rcu_utilization+0x127/0x133
> [ 15.248586] [<ffffffff8111291e>] rcu_cpu_notify+0x527/0x53b
> [ 15.248586] [<ffffffff810e9722>] cpu_startup_entry+0x1dc/0x4ea
> [ 15.248586] [<ffffffff8170eb5d>] rest_init+0x159/0x15f
> [ 15.248586] [<ffffffff8237b2da>] start_kernel+0x565/0x572
> [ 15.248586] [<ffffffff8237a120>] ? early_idt_handlers+0x120/0x120
> [ 15.248586] [<ffffffff8237a4e4>] x86_64_start_reservations+0x41/0x43
> [ 15.248586] [<ffffffff8237a623>] x86_64_start_kernel+0x13d/0x14c
> [ 15.265151] CPU 0 is now offline
> [ 15.265941] debug: unmapping init [mem 0xffffffff82365000-0xffffffff82539fff]
> [ 15.266726] Write protecting the kernel read-only data: 14336k
>
> git bisect start d58664252218cfefb19709d597ff0c5d93688203 26bc420b59a38e4e6685a73345a0def461136dce --
> git bisect bad 19f7d9c2f948a4c5c7742adb16fe00920f35f302 # 13:29 23- 6 Merge 'jtkirshe-net-next/i40e-queue' into devel-roam-smoke-201501311226
> git bisect bad 2c86978183cc365003e2d6949052a30865ef8b89 # 13:33 34- 32 Merge 'wsa/i2c/for-next' into devel-roam-smoke-201501311226
> git bisect good 1ffdd3662d27b1d4d59d51bbcc104b200be63d6a # 13:37 66+ 0 Merge 'pci/pci/virtualization' into devel-roam-smoke-201501311226
> git bisect bad 0ce6ea6707a3d5ae5bfdbdc4d16ebc86cff77f5f # 13:43 32- 22 Merge 'rcu/rcu/next' into devel-roam-smoke-201501311226
> git bisect good 53805a9f2fa76294af534fb7e9f96d43f1d820eb # 13:52 66+ 0 Merge 'iio/testing' into devel-roam-smoke-201501311226
> git bisect good 78e691f4ae2d5edea0199ca802bb505b9cdced88 # 14:01 66+ 0 Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
> git bisect good 17366dc8dc49858ba931c4120d8de494e388d93e # 14:05 66+ 0 documentation: Update rcutree.kthread_prio for grace-period kthread use
> git bisect good 569c1500e44189136c8a9f4b5e39f0055e422b0d # 14:14 66+ 0 documentation: Update based on on-demand vmstat workers
> git bisect good 14fefdb410cf48327c972ce91deb5e98edc8671f # 14:18 66+ 0 rcu: Eliminate ->onoff_mutex from rcu_node structure
> git bisect bad dd2b39be8eee9d175c7842c30e405a5cbe50095a # 14:29 11- 55 rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d # 14:34 66+ 0 cpu: Make CPU-offline idle-loop transition point more precise
> # first bad commit: [dd2b39be8eee9d175c7842c30e405a5cbe50095a] rcu: Handle outgoing CPUs on exit from idle loop
> git bisect good 8c50d7da9124a9f1e92e13996a0a148b2431390d # 14:37 198+ 0 cpu: Make CPU-offline idle-loop transition point more precise
> # extra tests with DEBUG_INFO
> git bisect good dd2b39be8eee9d175c7842c30e405a5cbe50095a # 15:35 198+ 198 rcu: Handle outgoing CPUs on exit from idle loop
> # extra tests on HEAD of linux-devel/devel-roam-smoke-201501311226
> git bisect bad d58664252218cfefb19709d597ff0c5d93688203 # 15:35 0- 15 0day head guard for 'devel-roam-smoke-201501311226'
> # extra tests on tree/branch rcu/rcu/next
> git bisect bad c418b8035fac0cc7d242e5de126cec1006a34bed # 15:52 47- 21 cpu: Stop newly offlined CPU from using RCU readers
> # extra tests with first bad commit reverted
> # extra tests on tree/branch linus/master
> git bisect good 2141fd018156db0f29efb384f4d99ead23b48f18 # 16:04 198+ 0 Merge tag 'char-misc-3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
> # extra tests on tree/branch next/master
> git bisect good 827e3bdf1bb2401c1a1e5586eb3977d228d298b2 # 16:12 198+ 0 Add linux-next specific files for 20150130
>
>
> This script may reproduce the error.
>
> ----------------------------------------------------------------------------
> #!/bin/bash
>
> kernel=$1
> initrd=quantal-core-x86_64.cgz
>
> wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd
>
> kvm=(
> qemu-system-x86_64
> -cpu kvm64
> -enable-kvm
> -kernel $kernel
> -initrd $initrd
> -m 320
> -smp 2
> -net nic,vlan=1,model=e1000
> -net user,vlan=1
> -boot order=nc
> -no-reboot
> -watchdog i6300esb
> -rtc base=localtime
> -serial stdio
> -display none
> -monitor null
> )
>
> append=(
> hung_task_panic=1
> earlyprintk=ttyS0,115200
> debug
> apic=debug
> sysrq_always_enabled
> rcupdate.rcu_cpu_stall_timeout=100
> panic=-1
> softlockup_panic=1
> nmi_watchdog=panic
> oops=panic
> load_ramdisk=2
> prompt_ramdisk=0
> console=ttyS0,115200
> console=tty0
> vga=normal
> root=/dev/ram0
> rw
> drbd.minor_count=8
> )
>
> "${kvm[@]}" --append "${append[*]}"
> ----------------------------------------------------------------------------
>
> Thanks,
> Fengguang
> _______________________________________________
> LKP mailing list
> LKP@xxxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/