Re: BUG_ON in rcu_sync_func triggered

From: Nikolay Borisov
Date: Tue Sep 13 2016 - 11:32:59 EST




On 09/13/2016 05:38 PM, Nikolay Borisov wrote:
>
>
> On 09/13/2016 05:35 PM, Nikolay Borisov wrote:
>>
>>
>> On 09/13/2016 04:43 PM, Oleg Nesterov wrote:
>>> On 09/13, Oleg Nesterov wrote:
>>>>
>>>> OK... perhaps the unbalanced up_write... I'll try to look at freeze/thaw code,
>>>
>>> Heh, yes, it looks racy or I am totally confused.
>>>
>>>> could test the debugging patch below meanwhile?
>>>
>>> Yes please. I'll send you another patch (hopefully fix) later, but it
>>> would be nice if you can test this patch to get more info.
>>
>> I've already started testing with this patch on 4.4.20 this time to see
>> what happens, but I'll likely get results tomorrow. For now I wasn't
>> able to crash it.
>
> Actually forget that, here is a warning that this triggered:
>
> [ 844.284959] ------------[ cut here ]------------
> [ 844.290454] WARNING: CPU: 2 PID: 1900 at kernel/rcu/sync.c:160 rcu_sync_func+0xc8/0x150()
> [ 844.300154] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
> [ 844.375006] CPU: 2 PID: 1900 Comm: fio Not tainted 4.4.20-clouder1 #9
> [ 844.382524] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
> [ 844.390241] 0000000000000000 ffff880277d03d78 ffffffff81307a9b 000000000000076c
> [ 844.399416] 0000000000000000 0000000000000000 00000000000000a0 ffff880277d03db8
> [ 844.408598] ffffffff81054a85 ffff880277d03dc8 ffff88047527daa0 ffff88047527da78
> [ 844.417771] Call Trace:
> [ 844.420822] <IRQ> [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
> [ 844.427659] [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
> [ 844.434695] [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
> [ 844.441532] [<ffffffff810ab788>] rcu_sync_func+0xc8/0x150
> [ 844.447983] [<ffffffff810b0620>] rcu_process_callbacks+0x290/0x740
> [ 844.455310] [<ffffffff810bbc52>] ? ktime_get+0x52/0xc0
> [ 844.461459] [<ffffffff810590f3>] __do_softirq+0x113/0x330
> [ 844.467909] [<ffffffff810593e5>] irq_exit+0x75/0x80
> [ 844.473775] [<ffffffff8163ea16>] smp_apic_timer_interrupt+0x46/0x55
> [ 844.481200] [<ffffffff8163d069>] apic_timer_interrupt+0x89/0x90
> [ 844.488234] <EOI> [<ffffffff811477b0>] ? shrink_inactive_list+0x1e0/0x5c0
> [ 844.496426] [<ffffffff811477a8>] ? shrink_inactive_list+0x1d8/0x5c0
> [ 844.503848] [<ffffffff8113c468>] ? global_dirty_limits+0x98/0xc0
> [ 844.510984] [<ffffffff8113c909>] ? throttle_vm_writeout+0x39/0xc0
> [ 844.518214] [<ffffffff811481c9>] shrink_lruvec+0x289/0x390
> [ 844.524754] [<ffffffff8119a6f9>] ? mem_cgroup_iter+0x2a9/0x3e0
> [ 844.531687] [<ffffffff811ce98c>] ? wb_queue_work+0x8c/0x100
> [ 844.538333] [<ffffffff811483fa>] shrink_zone+0x12a/0x360
> [ 844.544686] [<ffffffff8119e9b8>] ? vmpressure+0x88/0x90
> [ 844.550943] [<ffffffff811489ad>] do_try_to_free_pages+0x17d/0x450
> [ 844.558174] [<ffffffff81199451>] ? mem_cgroup_select_victim_node+0x1d1/0x1f0
> [ 844.566468] [<ffffffff81148d35>] try_to_free_mem_cgroup_pages+0xb5/0x190
> [ 844.574375] [<ffffffff8119d9dd>] try_charge+0x22d/0x720
> [ 844.580631] [<ffffffff8113025e>] ? find_get_entry+0x3e/0xd0
> [ 844.587281] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
> [ 844.593827] [<ffffffff8130c443>] ? radix_tree_lookup_slot+0x13/0x30
> [ 844.601251] [<ffffffff8119e637>] mem_cgroup_try_charge+0x57/0x150
> [ 844.608478] [<ffffffff81131b2c>] __add_to_page_cache_locked+0x4c/0x270
> [ 844.616194] [<ffffffff811db990>] ? __block_commit_write+0x80/0xb0
> [ 844.623419] [<ffffffff81131d78>] add_to_page_cache_lru+0x28/0x80
> [ 844.630548] [<ffffffff81131e67>] pagecache_get_page+0x97/0x1e0
> [ 844.637484] [<ffffffff81131fdb>] grab_cache_page_write_begin+0x2b/0x50
> [ 844.645202] [<ffffffff8123ff2d>] ext4_da_write_begin+0x17d/0x330
> [ 844.652334] [<ffffffff8123c716>] ? ext4_dirty_inode+0x66/0x80
> [ 844.659167] [<ffffffff8112ff80>] generic_perform_write+0xd0/0x1f0
> [ 844.666385] [<ffffffff81132916>] __generic_file_write_iter+0x196/0x1f0
> [ 844.674102] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
> [ 844.680648] [<ffffffff81233b0f>] ext4_file_write_iter+0x11f/0x3a0
> [ 844.687874] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
> [ 844.694418] [<ffffffff812339f0>] ? ext4_unwritten_wait+0xc0/0xc0
> [ 844.701547] [<ffffffff811f1a1e>] aio_run_iocb+0x1ee/0x290
> [ 844.707999] [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
> [ 844.714537] [<ffffffff811f1de1>] do_io_submit+0x321/0x530
> [ 844.720989] [<ffffffff811f1388>] ? SyS_io_getevents+0x58/0xc0
> [ 844.727828] [<ffffffff81002017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [ 844.735345] [<ffffffff811f2000>] SyS_io_submit+0x10/0x20
> [ 844.741701] [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
> [ 844.749230] ---[ end trace 5f72aeec215954f4 ]---
> [ 844.754708] XXX: ffff88047527da78 gp=2 cnt=0 cb=1
>

This is :

if (WARN_ON(rsp->gp_state != GP_PASSED)) xxx(rsp);