Re: [v2] timers: Fix excessive granularity of new timers after a nohz idle

From: jeffy
Date: Tue Aug 22 2017 - 23:22:29 EST


Hi guys,

I was testing a arm64 based device(chromebook bob), and the kernel hang a lot(during booting or suspending) on for-next(next-20170822) with this commit:
71acb768f5b3 (timers: Fix excessive granularity of new timers after a nohz idle)

After revert it(or just add "!base->is_idle" check back), things work well...


my dts:
arch/arm64/boot/dts/rockchip/rk3399-gru-kevin.dts

my cpu info:
localhost / # cat /proc/cpuinfo
processor : 0
model name : ARMv8 Processor rev 4 (v8l)
BogoMIPS : 48.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpa
e evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 1
model name : ARMv8 Processor rev 4 (v8l)
BogoMIPS : 48.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpa
e evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 2
model name : ARMv8 Processor rev 4 (v8l)
BogoMIPS : 48.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpa
e evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 3
model name : ARMv8 Processor rev 4 (v8l)
BogoMIPS : 48.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpa
e evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 4
model name : ARMv8 Processor rev 2 (v8l)
BogoMIPS : 48.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpa
e evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 2

processor : 5
model name : ARMv8 Processor rev 2 (v8l)
BogoMIPS : 48.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpa
e evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 2


my hang dmesg:
[ 304.797633] EXT4-fs (mmcblk1p3): mounted filesystem without journal. Opts: (null
)
[ 304.806070] VFS: Mounted root (ext2 filesystem) on device 179:3.
[ 304.815775] devtmpfs: mounted
[ 304.823110] Freeing unused kernel memory: 2048K
[ 304.891944] audit: type=1404 audit(304.617:2): enforcing=1 old_enforcing=0 auid=
4294967295 ses=4294967295
[ 364.655978] INFO: rcu_preempt self-detected stall on CPU
[ 364.661924] 3-...: (23 GPs behind) idle=486/2/0 softirq=39/40 fqs=14991
[ 364.669513] (t=60000 jiffies g=-244 c=-245 q=71)
[ 364.674872] Task dump for CPU 3:
[ 364.678478] swapper/3 R running task 0 0 1 0x00000022
[ 364.686365] Call trace:
[ 364.689095] [<ffffff8008089ed4>] dump_backtrace+0x0/0x220
[ 364.695133] [<ffffff800808a118>] show_stack+0x24/0x30
[ 364.700784] [<ffffff80080d84fc>] sched_show_task+0x138/0x168
[ 364.707112] [<ffffff80080d96bc>] dump_cpu_task+0x48/0x54
[ 364.713054] [<ffffff800810ce80>] rcu_dump_cpu_stacks+0xa0/0xe0
[ 364.719577] [<ffffff800810b9bc>] rcu_check_callbacks+0x328/0x934
[ 364.726295] [<ffffff8008112b20>] update_process_times+0x34/0x5c
[ 364.732919] [<ffffff8008123164>] tick_sched_handle.isra.3+0x54/0x68
[ 364.739928] [<ffffff80081231c0>] tick_sched_timer+0x48/0x84
[ 364.746160] [<ffffff8008113b40>] __hrtimer_run_queues+0x194/0x31c
[ 364.752974] [<ffffff8008113eb0>] hrtimer_interrupt+0xac/0x1bc
[ 364.759402] [<ffffff80086152a4>] arch_timer_handler_phys+0x3c/0x4c
[ 364.766316] [<ffffff80080fe838>] handle_percpu_devid_irq+0xf4/0x270
[ 364.773324] [<ffffff80080f8df0>] generic_handle_irq+0x2c/0x44
[ 364.779749] [<ffffff80080f94b0>] __handle_domain_irq+0x90/0xb8
[ 364.786271] [<ffffff8008080ef8>] gic_handle_irq+0x140/0x1a8
[ 364.792502] Exception stack(0xffffff800801bba0 to 0xffffff800801bce0)
[ 364.799696] bba0: 000000000000b7e9 0000000000000003 00000040eee72000 ffffffc0f7d
601f8
[ 364.808456] bbc0: 000000000000003f ffffffc0f7d5efb8 ffffff800801bde8 80000000000
00000
[ 364.817215] bbe0: 0000000000000001 0000000000000000 ffffff8009028338 00000000000
00000
[ 364.825975] bc00: 0000000000000105 0000000000000130 0000000000005dc0 ffffff8008a
58000
[ 364.834734] bc20: 0000000000000007 000000000000000e 000000000000005c ffffffc0002
18000
[ 364.843494] bc40: ffffff8009007000 ffffffc000218058 ffffff80080c1034 ffffffc0002
18000
[ 364.852253] bc60: ffffff800801bde0 ffffffc000218000 000000000000000a 00000000000
00001
[ 364.861013] bc80: ffffff80090e8000 ffffff800801bce0 ffffff8008885c00 ffffff80080
1bce0
[ 364.869773] bca0: ffffff8008885c04 0000000060000145 ffffffc000218058 ffffff80080
c1034
[ 364.878532] bcc0: 0000008000000000 ffffff8008885c00 ffffff800801bce0 ffffff80088
85c04
[ 364.887291] [<ffffff80080828f0>] el1_irq+0xb0/0x124
[ 364.892747] [<ffffff8008885c04>] _raw_spin_unlock_irq+0x24/0x50
[ 364.899368] [<ffffff80080c10a0>] idle_worker_timeout+0x6c/0x78
[ 364.905892] [<ffffff8008111924>] call_timer_fn+0xb0/0x1c8
[ 364.911929] [<ffffff8008111b48>] expire_timers+0x10c/0x158
[ 364.918055] [<ffffff8008111d54>] run_timer_softirq.part.1+0xb0/0xe8
[ 364.925062] [<ffffff8008111e50>] run_timer_softirq+0xc4/0xec
[ 364.931381] [<ffffff80080810ec>] __do_softirq+0x18c/0x348
[ 364.937419] [<ffffff80080af650>] irq_exit+0x7c/0xc4
[ 364.942872] [<ffffff80080f94b4>] __handle_domain_irq+0x94/0xb8
[ 364.949393] [<ffffff8008080ef8>] gic_handle_irq+0x140/0x1a8
[ 364.955623] Exception stack(0xffffff80092ebe10 to 0xffffff80092ebf50)
[ 364.962826] be00: 00000040eee72000 ffffff8008e
e8000
[ 364.971586] be20: 0000000000000000 ffffff80092ebf60 00000040eee72000 ffffff80090
2f900
[ 364.980345] be40: 00000046de94b940 0000000000000040 ffffffc0f2d35360 ffffff80092
ebed0
[ 364.989105] be60: 0000000000000a80 0000000000000003 0000000000003c93 00000000000
03c93
[ 364.997864] be80: 00000000000c8000 ffffff8008a58000 0000000000000007 00000000000
0000e
[ 365.006624] bea0: 000000000000005c ffffff8008ee8000 ffffff80090e9000 ffffff80090
e9160
[ 365.015383] bec0: ffffff8009010a30 0000000000000000 0000000000000000 00000000000
00000
[ 365.024143] bee0: 0000000000000000 0000000000000000 0000000000000000 ffffff80092
ebf50
[ 365.032902] bf00: ffffff80080855e0 ffffff80092ebf50 ffffff80080855e4 00000000600
00145
[ 365.041662] bf20: ffffff8008ee8000 ffffff8009010a30 ffffffffffffffff 00000000000
00000
[ 365.050420] bf40: ffffff80092ebf50 ffffff80080855e4
[ 365.055873] [<ffffff80080828f0>] el1_irq+0xb0/0x124
[ 365.061326] [<ffffff80080855e4>] arch_cpu_idle+0x74/0x19c
[ 365.067364] [<ffffff8008885a34>] default_idle_call+0x28/0x30
[ 365.073694] [<ffffff80080eb114>] do_idle+0xd8/0x198
[ 365.079148] [<ffffff80080eb390>] cpu_startup_entry+0x28/0x2c
[ 365.085478] [<ffffff8008090c1c>] secondary_start_kernel+0x184/0x190
[ 492.786002] INFO: task init:1 blocked for more than 120 seconds.
[ 492.792730] Not tainted 4.13.0-rc6-next-20170822 #100
[ 492.799000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this mes
sage.
[ 492.807773] init D 0 1 0 0x00400000
[ 492.813922] Call trace:
[ 492.816662] [<ffffff8008085c28>] __switch_to+0x7c/0x88
[ 492.822415] [<ffffff8008880ebc>] __schedule+0x40c/0x6c4
[ 492.828265] [<ffffff8008881204>] schedule+0x90/0xb0
[ 492.833726] [<ffffff8008884b54>] schedule_timeout+0x4c/0x354
[ 492.840062] [<ffffff8008881f0c>] wait_for_common+0x164/0x19c
[ 492.846397] [<ffffff8008881f6c>] wait_for_completion+0x28/0x34
[ 492.852928] [<ffffff80081051cc>] __wait_rcu_gp+0x160/0x18c
[ 492.859062] [<ffffff800810b248>] synchronize_rcu+0x6c/0x90
[ 492.865205] [<ffffff80086b455c>] synchronize_net+0x38/0x40
[ 492.871348] [<ffffff8008309c4c>] selinux_netcache_avc_callback+0x30/0x40
[ 492.878851] [<ffffff8008302d2c>] avc_ss_reset+0x44/0xa4
[ 492.884703] [<ffffff800830e97c>] sel_write_enforce+0x138/0x198
[ 492.891235] [<ffffff80081f2a00>] __vfs_write+0x60/0x16c
[ 492.897086] [<ffffff80081f3e90>] vfs_write+0xc4/0x160
[ 492.902740] [<ffffff80081f5348>] SyS_write+0x70/0xc8
[ 492.908297] Exception stack(0xffffff800804bec0 to 0xffffff800804c000)
[ 492.915513] bec0: 0000000000000003 00000000ff7f46a0 0000000000000001 00000000fff
fffff
[ 492.924285] bee0: 00000000ff7f46a0 0000000000000003 0000000005a417b8 00000000000
00004
[ 492.933075] bf00: 00000000ff7f5708 00000000ffffffff 00000000045f8d0c 00000000000
00000
[ 492.941843] bf20: 0000000000000000 00000000ff7f4694 00000000ea720eaf 00000000000
00000
[ 492.950609] bf40: 0000000000000000 0000000000000000 0000000000000000 00000000000
00000
[ 492.959375] bf60: 0000000000000000 0000000000000000 0000000000000000 00000000000
00000
[ 492.968141] bf80: 0000000000000000 0000000000000000 0000000000000000 00000000000
00000
[ 492.976907] bfa0: 0000000000000000 0000000000000000 0000000000000000 00000000000
00000
[ 492.985674] bfc0: 00000000ea6aba56 0000000060000030 0000000000000003 00000000000
00004
[ 492.994440] bfe0: 0000000000000000 0000000000000000 0000000000000000 00000000000
00000
[ 493.003207] [<ffffff8008082fb0>] el0_svc_naked+0x24/0x28
[ 493.009153] Kernel panic - not syncing: hung_task: blocked tasks
[ 493.015872] CPU: 0 PID: 43 Comm: khungtaskd Not tainted 4.13.0-rc6-next-20170822
#100
[ 493.024630] Hardware name: Google Kevin (DT)
[ 493.029400] Call trace:
[ 493.032132] [<ffffff8008089ed4>] dump_backtrace+0x0/0x220
[ 493.038169] [<ffffff800808a118>] show_stack+0x24/0x30
[ 493.043819] [<ffffff800886fb6c>] dump_stack+0x98/0xb8
[ 493.049468] [<ffffff80080a99e0>] panic+0x144/0x2a0
[ 493.054826] [<ffffff800814e788>] watchdog+0x348/0x380
[ 493.060475] [<ffffff80080c9de0>] kthread+0x134/0x13c
[ 493.066025] [<ffffff8008084bf0>] ret_from_fork+0x10/0x20
[ 493.071966] SMP: stopping secondary CPUs
[ 493.076352] Kernel Offset: disabled
[ 493.080249] CPU features: 0x00200c
[ 493.084048] Memory Limit: none