Re: net/core: BUG in copy_net_ns()

From: Eric W. Biederman
Date: Fri Jan 11 2019 - 18:51:39 EST


zzoru <zzoru007@xxxxxxxxx> writes:

>> I received 3 spam messages from this address today.
>> We can simply ignore this report.
> I already mentioned about this.
>
>> and, sorry for my encrypted mails.
>> I don't understand this failure report at all.
>>
>> I don't see the connection to copy_net_ns(). And I don't see how the
>> suggested patch short of covering up a memory stomp could possibly make
>> a difference.
>>
>> What am I missing?
>> void execute_one(void)
>> {
>> Â syscall(__NR_unshare, 0x40000000);
>> }
> ksys_unshare -> unshare_nsproxy_namespaces -> create_new_namespaces ->
> copy_net_ns
> unshare(CLONE_NEWNET) calls copy_net_ns() (It requires the CAP_SYS_ADMIN
> capability)

Looking at your alternate patch where you switch the structure
order it looks like there is a memory stomp. Probably a use
after free. It is a shame that KASAN is not catching the problem.
That is my only suggestion at the moment.

The OOM may be because network namespaces are created in quick
succession and they take a while to free.

One of the nasty truths about testing is sometimes you can be testing
one thing and you can trigger a bug in something completely different.
Right now it looks like anything that copy_net_ns calls could be
responsible for the memory problems.

Eric


> I made many error reports about this bug, and the other one is
>
> [ÂÂ 90.289025] WARNING: CPU: 1 PID: 1732 at mm/page_alloc.c:4415
> __alloc_pages_slowpath+0x1cb1/0x2220
> [ÂÂ 90.290223] Modules linked in:
> [ÂÂ 90.290639] CPU: 1 PID: 1732 Comm: kworker/u4:5 Not tainted 5.0.0-rc1+ #6
> [ÂÂ 90.291475] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [ÂÂ 90.292681] Workqueue: writeback wb_workfn (flush-8:0)
> [ÂÂ 90.293350] RIP: 0010:__alloc_pages_slowpath+0x1cb1/0x2220
> [ÂÂ 90.294075] Code: 8b 84 24 a8 00 00 00 e9 ea f1 ff ff 85 d2 0f 85 0b
> 01 00 00 48 c7 c7 c0 5e 55 84 e8 79 f8 23 02 e9 86 f9 ff ff 44 8b 74 24
> 0c <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 8b 54 24 18 48 c1 ea 03 80
> [ÂÂ 90.296527] RSP: 0018:ffff888064276dd8 EFLAGS: 00010046
> [ÂÂ 90.297203] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 1ffff1100c84eda8
> [ÂÂ 90.297784] kmemleak: Cannot allocate a kmemleak_object structure
> [ÂÂ 90.298186] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff88807ffdd528
> [ÂÂ 90.298242] RBP: dffffc0000000000 R08: 0000000000000000 R09:
> 0000000000000679
> [ÂÂ 90.298247] R10: 0000000000000000 R11: ffff88807ffdc487 R12:
> 0000000000000000
> [ÂÂ 90.298251] R13: ffff888064277030 R14: 0000000000415a00 R15:
> ffff888064277030
> [ÂÂ 90.298257] FS:Â 0000000000000000(0000) GS:ffff88806d500000(0000)
> knlGS:0000000000000000
> [ÂÂ 90.298262] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ÂÂ 90.298267] CR2: 00007fff6ac6a718 CR3: 0000000056578000 CR4:
> 00000000000006e0
> [ÂÂ 90.298272] Call Trace:
> [ÂÂ 90.298283]Â ? __alloc_pages_slowpath+0x1ce6/0x2220
> [ÂÂ 90.298299]Â ? warn_alloc+0x120/0x120
> [ÂÂ 90.302432] kmemleak: Kernel memory leak detector disabled
> [ÂÂ 90.303346]Â ? lock_acquire+0x103/0x2e0
> [ÂÂ 90.303358]Â ? __isolate_free_page+0x4b0/0x4b0
> [ÂÂ 90.303366]Â ? __lock_is_held+0xad/0x140
> [ÂÂ 90.303377]Â __alloc_pages_nodemask+0x521/0x5f0
> [ÂÂ 90.303386]Â ? __alloc_pages_slowpath+0x2220/0x2220
> [ÂÂ 90.315010]Â cache_grow_begin+0x95/0x300
> [ÂÂ 90.315613]Â fallback_alloc+0x1ce/0x270
> [ÂÂ 90.316211]Â ? mempool_free+0x360/0x360
> [ÂÂ 90.316767]Â kmem_cache_alloc+0x286/0x2f0
> [ÂÂ 90.317348]Â ? mempool_free+0x360/0x360
> [ÂÂ 90.317919]Â create_object+0x83/0x880
> [ÂÂ 90.318517]Â ? kmemleak_disable+0x90/0x90
> [ÂÂ 90.319103]Â ? mark_held_locks+0xc1/0x140
> [ÂÂ 90.319679]Â ? kmem_cache_alloc+0x9c/0x2f0
> [ÂÂ 90.320307]Â ? mempool_free+0x360/0x360
> [ÂÂ 90.320900]Â kmem_cache_alloc+0x18f/0x2f0
> [ÂÂ 90.321650]Â ? mempool_free+0x360/0x360
> [ÂÂ 90.322228]Â mempool_alloc+0x13e/0x340
> [ÂÂ 90.322765]Â ? mempool_destroy+0x30/0x30
> [ÂÂ 90.323370]Â ? mark_held_locks+0xc1/0x140
> [ÂÂ 90.323993]Â ? _raw_spin_unlock_irqrestore+0x3e/0x50
> [ÂÂ 90.324786]Â bio_alloc_bioset+0x36f/0x5d0
> [ÂÂ 90.325397]Â ? __test_set_page_writeback+0x136/0x960
> [ÂÂ 90.326161]Â ? bvec_alloc+0x2d0/0x2d0
> [ÂÂ 90.326708]Â ? wait_for_stable_page+0x290/0x290
> [ÂÂ 90.327392]Â submit_bh_wbc.isra.57+0x128/0x680
> [ÂÂ 90.328053]Â ? create_page_buffers+0x111/0x200
> [ÂÂ 90.328685]Â __block_write_full_page+0x6e8/0xcd0
> [ÂÂ 90.329339]Â ? check_disk_change+0x130/0x130
> [ÂÂ 90.329966]Â block_write_full_page+0x202/0x250
> [ÂÂ 90.330675]Â ? check_disk_change+0x130/0x130
> [ÂÂ 90.331291]Â __writepage+0x62/0xe0
> [ÂÂ 90.331786]Â write_cache_pages+0x5b8/0xf60
> [ÂÂ 90.332375]Â ? __wb_calc_thresh+0x290/0x290
> [ÂÂ 90.332976]Â ? clear_page_dirty_for_io+0x5c0/0x5c0
> [ÂÂ 90.333686]Â ? mark_held_locks+0x140/0x140
> [ÂÂ 90.334301]Â ? print_circular_bug_entry+0x1f/0x60
> [ÂÂ 90.334999]Â ? __lock_acquire+0x5d6/0x4630
> [ÂÂ 90.335621]Â generic_writepages+0xda/0x150
> [ÂÂ 90.336243]Â ? write_cache_pages+0xf60/0xf60
> [ÂÂ 90.336852]Â ? mark_held_locks+0x140/0x140
> [ÂÂ 90.337453]Â ? blkdev_readpages+0x30/0x30
> [ÂÂ 90.338020]Â do_writepages+0xf0/0x290
> [ÂÂ 90.338611]Â ? page_writeback_cpu_online+0x10/0x10
> [ÂÂ 90.339324]Â ? __lock_is_held+0xad/0x140
> [ÂÂ 90.339900]Â __writeback_single_inode+0xf3/0x1000
> [ÂÂ 90.340587]Â writeback_sb_inodes+0x4e7/0xce0
> [ÂÂ 90.341214]Â ? __writeback_single_inode+0x1000/0x1000
> [ÂÂ 90.341929]Â ? down_read_trylock+0x5b/0x90
> [ÂÂ 90.342579]Â ? trylock_super+0x1d/0x100
> [ÂÂ 90.343162]Â __writeback_inodes_wb+0x109/0x220
> [ÂÂ 90.343799]Â wb_writeback+0x7a1/0xb90
> [ÂÂ 90.344347]Â ? writeback_inodes_wb.constprop.44+0x190/0x190
> [ÂÂ 90.345143]Â ? cpumask_next+0x1f/0x30
> [ÂÂ 90.345679]Â ? find_next_bit+0x101/0x130
> [ÂÂ 90.346281]Â ? get_nr_dirty_inodes+0xd0/0x130
> [ÂÂ 90.346909]Â wb_workfn+0x921/0xec0
> [ÂÂ 90.347397]Â ? process_one_work+0xadd/0x1bb0
> [ÂÂ 90.348025]Â ? inode_wait_for_writeback+0x30/0x30
> [ÂÂ 90.348700]Â process_one_work+0xbbd/0x1bb0
> [ÂÂ 90.349314]Â ? max_active_store+0x130/0x130
> [ÂÂ 90.349915]Â ? do_raw_spin_lock+0x11b/0x280
> [ÂÂ 90.350557]Â worker_thread+0x8c/0x1060
> [ÂÂ 90.351096]Â ? __kthread_parkme+0xf8/0x1a0
> [ÂÂ 90.351673]Â ? process_one_work+0x1bb0/0x1bb0
> [ÂÂ 90.352334]Â kthread+0x347/0x410
> [ÂÂ 90.352798]Â ? kthread_create_worker_on_cpu+0xe0/0xe0
> [ÂÂ 90.353509]Â ret_from_fork+0x3a/0x50
> [ÂÂ 90.354020] irq event stamp: 282384
> [ 90.354590] hardirqs last enabled at (282383): [<ffffffff8160678c>]
> kmem_cache_alloc+0x9c/0x2f0
> [ÂÂ 90.355832] hardirqs last disabled at (282384): [<ffffffff8160674d>]
> kmem_cache_alloc+0x5d/0x2f0
> [ 90.357066] softirqs last enabled at (282196): [<ffffffff816daa87>]
> wb_workfn+0x387/0xec0
> [ÂÂ 90.358280] softirqs last disabled at (282194): [<ffffffff816da918>]
> wb_workfn+0x218/0xec0
> [ÂÂ 90.359426] ---[ end trace 71c4462c6227f0d8 ]---
> [ÂÂ 90.360135] kmemleak: Cannot allocate a kmemleak_object structure
> [ÂÂ 90.888624] a.out invoked oom-killer:
> gfp_mask=0x6040d0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
> oom_score_adj=0
> [ÂÂ 90.890564] CPU: 0 PID: 22248 Comm: a.out Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂÂ
> 5.0.0-rc1+ #6
> [ÂÂ 90.891793] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [ÂÂ 90.893263] Call Trace:
> [ÂÂ 90.893678]Â dump_stack+0xca/0x13e
> [ÂÂ 90.894242]Â dump_header+0x108/0xaef
> [ÂÂ 90.894822]Â ? ___ratelimit+0x5b/0x436
> [ÂÂ 90.895430]Â oom_kill_process.cold.38+0x10/0xa87
> [ÂÂ 90.896164]Â ? lock_downgrade+0x5d0/0x5d0
> [ÂÂ 90.896806]Â ? _raw_spin_unlock+0x1f/0x30
> [ÂÂ 90.897445]Â ? oom_badness+0xc8/0x770
> [ÂÂ 90.898045]Â out_of_memory+0x32a/0x1ab0
> [ÂÂ 90.898668]Â ? oom_killer_disable+0x280/0x280
> [ÂÂ 90.899365]Â ? mutex_trylock+0x162/0x1a0
> [ÂÂ 90.899998]Â __alloc_pages_slowpath+0x1b7a/0x2220
> [ÂÂ 90.900754]Â ? warn_alloc+0x120/0x120
> [ÂÂ 90.901344]Â ? find_held_lock+0x33/0x1c0
> [ÂÂ 90.901985]Â __alloc_pages_nodemask+0x521/0x5f0
> [ÂÂ 90.902723]Â ? __alloc_pages_slowpath+0x2220/0x2220
> [ÂÂ 90.903499]Â ? mark_held_locks+0xc1/0x140
> [ÂÂ 90.904137]Â ? cache_grow_begin+0x28f/0x300
> [ÂÂ 90.904807]Â cache_grow_begin+0x95/0x300
> [ÂÂ 90.905443]Â fallback_alloc+0x1ce/0x270
> [ÂÂ 90.906074]Â kmem_cache_alloc+0x286/0x2f0
> [ÂÂ 90.906720]Â ? sock_destroy_inode+0x60/0x60
> [ÂÂ 90.907392]Â sock_alloc_inode+0x18/0x250
> [ÂÂ 90.908021]Â ? sock_destroy_inode+0x60/0x60
> [ÂÂ 90.908690]Â alloc_inode+0x5e/0x180
> [ÂÂ 90.909254]Â new_inode_pseudo+0x12/0xd0
> [ÂÂ 90.909868]Â sock_alloc+0x3c/0x270
> [ÂÂ 90.910428]Â __sock_create+0xbe/0x740
> [ÂÂ 90.911026]Â inet_ctl_sock_create+0x8c/0x1e0
> [ÂÂ 90.911710]Â ? inet_current_timestamp+0xc0/0xc0
> [ÂÂ 90.912432]Â ? rcu_read_lock_sched_held+0x10f/0x130
> [ÂÂ 90.913205]Â ? find_next_bit+0x101/0x130
> [ÂÂ 90.913837]Â icmpv6_sk_init+0x12a/0x2b0
> [ÂÂ 90.914463]Â ? inet6_net_init+0x437/0x7c0
> [ÂÂ 90.915102]Â ? icmpv6_err_convert+0x180/0x180
> [ÂÂ 90.915799]Â ? ac6_proc_init+0x5a/0x70
> [ÂÂ 90.916402]Â ? inet6_net_init+0x53b/0x7c0
> [ÂÂ 90.917041]Â ? icmpv6_err_convert+0x180/0x180
> [ÂÂ 90.917734]Â ops_init+0xb2/0x400
> [ÂÂ 90.918265]Â setup_net+0x24c/0x5e0
> [ÂÂ 90.918817]Â ? ops_init+0x400/0x400
> [ÂÂ 90.919386]Â copy_net_ns+0x1a2/0x270
> [ÂÂ 90.919969]Â create_new_namespaces+0x579/0x790
> [ÂÂ 90.920676]Â unshare_nsproxy_namespaces+0xc3/0x190
> [ÂÂ 90.921435]Â ksys_unshare+0x428/0x810
> [ÂÂ 90.922029]Â ? walk_process_tree+0x2c0/0x2c0
> [ÂÂ 90.922712]Â ? __change_pid+0x19c/0x2c0
> [ÂÂ 90.923328]Â ? _raw_write_unlock_irq+0x24/0x30
> [ÂÂ 90.924038]Â ? trace_hardirqs_on_thunk+0x1a/0x1c
> [ÂÂ 90.924771]Â ? trace_hardirqs_off_caller+0x55/0x1c0
> [ÂÂ 90.925547]Â __x64_sys_unshare+0x2d/0x40
> [ÂÂ 90.926187]Â do_syscall_64+0xbc/0x4e0
> [ÂÂ 90.926777]Â entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ÂÂ 90.927573] RIP: 0033:0x7f827ad52229
> [ÂÂ 90.928146] Code: Bad RIP value.
> [ÂÂ 90.928663] RSP: 002b:00007fff6ac6a6c8 EFLAGS: 00000217 ORIG_RAX:
> 0000000000000110
> [ÂÂ 90.929837] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007f827ad52229
> [ÂÂ 90.930952] RDX: 00007f827ad27147 RSI: 0000000000000000 RDI:
> 0000000040000000
> [ÂÂ 90.932056] RBP: 00007fff6ac6a6d0 R08: 0000000000000005 R09:
> 00007fff6ac6a720
> [ÂÂ 90.933165] R10: 0000000000000000 R11: 0000000000000217 R12:
> 00005607242822e0
> [ÂÂ 90.934278] R13: 00007fff6ac6a830 R14: 0000000000000000 R15:
> 0000000000000000
>
> I just guess that copy_net_ns func doesn't call net_free, and it makes OOM.
>
> And, I found that
>
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index 99d4148e0f90..38c474e4ab4c 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -50,12 +50,12 @@ struct bpf_prog;
> Â#define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
>
> Âstruct net {
> -ÂÂÂÂÂÂ refcount_tÂÂÂÂÂÂÂÂÂÂÂÂÂ passive;ÂÂÂÂÂÂÂ /* To decided when the
> network
> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * namespace should be
> freed.
> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
> ÂÂÂÂÂÂÂ refcount_tÂÂÂÂÂÂÂÂÂÂÂÂÂ count;ÂÂÂÂÂÂÂÂÂ /* To decided when the
> network
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ *Â namespace should be
> shut down.
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
> +ÂÂÂÂÂÂ refcount_tÂÂÂÂÂÂÂÂÂÂÂÂÂ passive;ÂÂÂÂÂÂÂ /* To decided when the
> network
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * namespace should be
> freed.
> +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
> ÂÂÂÂÂÂÂ spinlock_tÂÂÂÂÂÂÂÂÂÂÂÂÂ rules_mod_lock;
>
> ÂÂÂÂÂÂÂ atomic64_tÂÂÂÂÂÂÂÂÂÂÂÂÂ cookie_gen;
>
> this patch also works on this bug. (Just swap the order of net struct.)
> I don't know why this patch works (I just thought that compiler
> optimization issue can make this bug and try this one.)
> I need to review code more on copy_net_ns().
>
> Also, I reproduce this bug on Ubuntu 18.10 (4.18.0-10-generic) on VMWare
> Workstation Pro 15.0.2 by C reproducer.
>
> On 12/01/2019 5:41 ìì, Kirill Tkhai wrote:
>> On 11.01.2019 23:33, Eric W. Biederman wrote:
>>> zzoru <zzoru007@xxxxxxxxx> writes:
>>>
>>>> net/core: BUG in copy_net_ns() (net_namespace.c)
>>> I don't understand this failure report at all.
>>>
>>> I don't see the connection to copy_net_ns(). And I don't see how the
>>> suggested patch short of covering up a memory stomp could possibly make
>>> a difference.
>>>
>>> What am I missing?
>> I received 3 spam messages from this address today.
>> We can simply ignore this report.
>>
>>>
>>>> Hello,
>>>>
>>>> I've got the following error report while fuzzing the kernel with syzkaller.
>>>>
>>>> On commit 1bdbe227492075d058e37cb3d400e6468d0095b5
>>>>
>>>> Syzkaller hit 'WARNING in __alloc_pages_slowpath' bug.
>>>>
>>>> syz-executor561 (17453) used greatest stack depth: 25056 bytes left
>>>> WARNING: CPU: 0 PID: 692 at mm/page_alloc.c:4415
>>>> __alloc_pages_slowpath+0x1cb1/0x2220 mm/page_alloc.c:4386
>>>> Kernel panic - not syncing: panic_on_warn set ...
>>>> CPU: 0 PID: 692 Comm: kswapd0 Not tainted 5.0.0-rc1+ #4
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>>> Ubuntu-1.8.2-1ubuntu1 04/01/2014
>>>> Call Trace:
>>>> Â__dump_stack lib/dump_stack.c:77 [inline]
>>>> Âdump_stack+0xca/0x13e lib/dump_stack.c:113
>>>> Âpanic+0x278/0x5bf kernel/panic.c:214
>>>> Â__warn.cold.10+0x20/0x45 kernel/panic.c:571
>>>> Âreport_bug+0x246/0x2d0 lib/bug.c:186
>>>> Âfixup_bug arch/x86/kernel/traps.c:178 [inline]
>>>> Âdo_error_trap+0x123/0x1e0 arch/x86/kernel/traps.c:271
>>>> Âdo_invalid_op+0x31/0x40 arch/x86/kernel/traps.c:290
>>>> Âinvalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
>>>> RIP: 0010:__alloc_pages_slowpath+0x1cb1/0x2220 mm/page_alloc.c:4415
>>>> Code: 8b 84 24 a8 00 00 00 e9 ea f1 ff ff 85 d2 0f 85 0b 01 00 00 48 c7
>>>> c7 c0 5e 55 84 e8 79 f8 23 02 e9 86 f9 ff ff 44 8b 74 24 0c <0f> 0b 48
>>>> b8 00 00 00 00 00 fc ff df 48 8b 54 24 18 48 c1 ea 03 80
>>>> RSP: 0018:ffff8880683fedb8 EFLAGS: 00010046
>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff1100d07fda4
>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88807ffdd528
>>>> RBP: dffffc0000000000 R08: 0000000000000000 R09: 000000000000067a
>>>> R10: 0000000000000000 R11: ffff88807ffdc487 R12: 0000000000000000
>>>> R13: ffff8880683ff010 R14: 0000000000415a00 R15: ffff8880683ff010
>>>> Â__alloc_pages_nodemask+0x521/0x5f0 mm/page_alloc.c:4555
>>>> Â__alloc_pages include/linux/gfp.h:473 [inline]
>>>> Â__alloc_pages_node include/linux/gfp.h:486 [inline]
>>>> Âkmem_getpages mm/slab.c:1398 [inline]
>>>> Âcache_grow_begin+0x95/0x300 mm/slab.c:2666
>>>> Âfallback_alloc+0x1ce/0x270 mm/slab.c:3208
>>>> Â__do_cache_alloc mm/slab.c:3345 [inline]
>>>> Âslab_alloc mm/slab.c:3373 [inline]
>>>> Âkmem_cache_alloc+0x286/0x2f0 mm/slab.c:3541
>>>> Âcreate_object+0x83/0x880 mm/kmemleak.c:578
>>>> Âkmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
>>>> Âslab_post_alloc_hook mm/slab.h:442 [inline]
>>>> Âslab_alloc mm/slab.c:3381 [inline]
>>>> Âkmem_cache_alloc+0x18f/0x2f0 mm/slab.c:3541
>>>> Âmempool_alloc+0x13e/0x340 mm/mempool.c:385
>>>> Âbio_alloc_bioset+0x36f/0x5d0 block/bio.c:489
>>>> Âbio_alloc include/linux/bio.h:393 [inline]
>>>> Âsubmit_bh_wbc.isra.57+0x128/0x680 fs/buffer.c:3061
>>>> Â__block_write_full_page+0x6e8/0xcd0 fs/buffer.c:1765
>>>> Âblock_write_full_page+0x202/0x250 fs/buffer.c:2955
>>>> Âpageout mm/vmscan.c:865 [inline]
>>>> Âshrink_page_list+0x220f/0x3800 mm/vmscan.c:1383
>>>> Âshrink_inactive_list+0x3c2/0xaa0 mm/vmscan.c:1961
>>>> Âshrink_list mm/vmscan.c:2273 [inline]
>>>> Âshrink_node_memcg.constprop.83+0x4bf/0x10e0 mm/vmscan.c:2538
>>>> Âshrink_node+0x162/0xd10 mm/vmscan.c:2753
>>>> Âkswapd_shrink_node mm/vmscan.c:3516 [inline]
>>>> Âbalance_pgdat+0x47f/0xc00 mm/vmscan.c:3674
>>>> Âkswapd+0x57c/0xde0 mm/vmscan.c:3929
>>>> Âkthread+0x347/0x410 kernel/kthread.c:246
>>>> Âret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
>>>> Dumping ftrace buffer:
>>>> ÂÂ (ftrace buffer empty)
>>>> Kernel Offset: disabled
>>>> Rebooting in 86400 seconds..
>>>>
>>>>
>>>> Syzkaller reproducer:
>>>> # {Threaded:false Collide:false Repeat:true RepeatTimes:0 Procs:8
>>>> Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:false
>>>> UseTmpDir:true EnableCgroups:false EnableNetdev:true ResetNet:false
>>>> HandleSegv:false Repro:false Trace:false}
>>>> unshare(0x40000000)
>>>>
>>>>
>>>> C reproducer:
>>>> // autogenerated by syzkaller (https://github.com/google/syzkaller)
>>>>
>>>> #define _GNU_SOURCE
>>>>
>>>> #include <arpa/inet.h>
>>>> #include <dirent.h>
>>>> #include <endian.h>
>>>> #include <errno.h>
>>>> #include <fcntl.h>
>>>> #include <net/if.h>
>>>> #include <net/if_arp.h>
>>>> #include <netinet/in.h>
>>>> #include <sched.h>
>>>> #include <signal.h>
>>>> #include <stdarg.h>
>>>> #include <stdbool.h>
>>>> #include <stdint.h>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #include <string.h>
>>>> #include <sys/ioctl.h>
>>>> #include <sys/mount.h>
>>>> #include <sys/prctl.h>
>>>> #include <sys/resource.h>
>>>> #include <sys/socket.h>
>>>> #include <sys/stat.h>
>>>> #include <sys/syscall.h>
>>>> #include <sys/time.h>
>>>> #include <sys/types.h>
>>>> #include <sys/uio.h>
>>>> #include <sys/wait.h>
>>>> #include <time.h>
>>>> #include <unistd.h>
>>>>
>>>> #include <linux/if_addr.h>
>>>> #include <linux/if_ether.h>
>>>> #include <linux/if_link.h>
>>>> #include <linux/if_tun.h>
>>>> #include <linux/in6.h>
>>>> #include <linux/ip.h>
>>>> #include <linux/neighbour.h>
>>>> #include <linux/net.h>
>>>> #include <linux/netlink.h>
>>>> #include <linux/rtnetlink.h>
>>>> #include <linux/tcp.h>
>>>> #include <linux/veth.h>
>>>>
>>>> unsigned long long procid;
>>>>
>>>> static void sleep_ms(uint64_t ms)
>>>> {
>>>> Â usleep(ms * 1000);
>>>> }
>>>>
>>>> static uint64_t current_time_ms(void)
>>>> {
>>>> Â struct timespec ts;
>>>> Â if (clock_gettime(CLOCK_MONOTONIC, &ts))
>>>> ÂÂÂ exit(1);
>>>> Â return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
>>>> }
>>>>
>>>> static void use_temporary_dir(void)
>>>> {
>>>> Â char tmpdir_template[] = "./syzkaller.XXXXXX";
>>>> Â char* tmpdir = mkdtemp(tmpdir_template);
>>>> Â if (!tmpdir)
>>>> ÂÂÂ exit(1);
>>>> Â if (chmod(tmpdir, 0777))
>>>> ÂÂÂ exit(1);
>>>> Â if (chdir(tmpdir))
>>>> ÂÂÂ exit(1);
>>>> }
>>>>
>>>> static bool write_file(const char* file, const char* what, ...)
>>>> {
>>>> Â char buf[1024];
>>>> Â va_list args;
>>>> Â va_start(args, what);
>>>> Â vsnprintf(buf, sizeof(buf), what, args);
>>>> Â va_end(args);
>>>> Â buf[sizeof(buf) - 1] = 0;
>>>> Â int len = strlen(buf);
>>>> Â int fd = open(file, O_WRONLY | O_CLOEXEC);
>>>> Â if (fd == -1)
>>>> ÂÂÂ return false;
>>>> Â if (write(fd, buf, len) != len) {
>>>> ÂÂÂ int err = errno;
>>>> ÂÂÂ close(fd);
>>>> ÂÂÂ errno = err;
>>>> ÂÂÂ return false;
>>>> Â }
>>>> Â close(fd);
>>>> Â return true;
>>>> }
>>>>
>>>> static struct {
>>>> Â char* pos;
>>>> Â int nesting;
>>>> Â struct nlattr* nested[8];
>>>> Â char buf[1024];
>>>> } nlmsg;
>>>>
>>>> static void netlink_init(int typ, int flags, const void* data, int size)
>>>> {
>>>> Â memset(&nlmsg, 0, sizeof(nlmsg));
>>>> Â struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg.buf;
>>>> Â hdr->nlmsg_type = typ;
>>>> Â hdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
>>>> Â memcpy(hdr + 1, data, size);
>>>> Â nlmsg.pos = (char*)(hdr + 1) + NLMSG_ALIGN(size);
>>>> }
>>>>
>>>> static void netlink_attr(int typ, const void* data, int size)
>>>> {
>>>> Â struct nlattr* attr = (struct nlattr*)nlmsg.pos;
>>>> Â attr->nla_len = sizeof(*attr) + size;
>>>> Â attr->nla_type = typ;
>>>> Â memcpy(attr + 1, data, size);
>>>> Â nlmsg.pos += NLMSG_ALIGN(attr->nla_len);
>>>> }
>>>>
>>>> static void netlink_nest(int typ)
>>>> {
>>>> Â struct nlattr* attr = (struct nlattr*)nlmsg.pos;
>>>> Â attr->nla_type = typ;
>>>> Â nlmsg.pos += sizeof(*attr);
>>>> Â nlmsg.nested[nlmsg.nesting++] = attr;
>>>> }
>>>>
>>>> static void netlink_done(void)
>>>> {
>>>> Â struct nlattr* attr = nlmsg.nested[--nlmsg.nesting];
>>>> Â attr->nla_len = nlmsg.pos - (char*)attr;
>>>> }
>>>>
>>>> static int netlink_send(int sock)
>>>> {
>>>> Â if (nlmsg.pos > nlmsg.buf + sizeof(nlmsg.buf) || nlmsg.nesting)
>>>> ÂÂÂ exit(1);
>>>> Â struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg.buf;
>>>> Â hdr->nlmsg_len = nlmsg.pos - nlmsg.buf;
>>>> Â struct sockaddr_nl addr;
>>>> Â memset(&addr, 0, sizeof(addr));
>>>> Â addr.nl_family = AF_NETLINK;
>>>> Â unsigned n = sendto(sock, nlmsg.buf, hdr->nlmsg_len, 0,
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (struct sockaddr*)&addr, sizeof(addr));
>>>> Â if (n != hdr->nlmsg_len)
>>>> ÂÂÂ exit(1);
>>>> Â n = recv(sock, nlmsg.buf, sizeof(nlmsg.buf), 0);
>>>> Â if (n < sizeof(struct nlmsghdr) + sizeof(struct nlmsgerr))
>>>> ÂÂÂ exit(1);
>>>> Â if (hdr->nlmsg_type != NLMSG_ERROR)
>>>> ÂÂÂ exit(1);
>>>> Â return -((struct nlmsgerr*)(hdr + 1))->error;
>>>> }
>>>>
>>>> static void netlink_add_device_impl(const char* type, const char* name)
>>>> {
>>>> Â struct ifinfomsg hdr;
>>>> Â memset(&hdr, 0, sizeof(hdr));
>>>> Â netlink_init(RTM_NEWLINK, NLM_F_EXCL | NLM_F_CREATE, &hdr, sizeof(hdr));
>>>> Â if (name)
>>>> ÂÂÂ netlink_attr(IFLA_IFNAME, name, strlen(name));
>>>> Â netlink_nest(IFLA_LINKINFO);
>>>> Â netlink_attr(IFLA_INFO_KIND, type, strlen(type));
>>>> }
>>>>
>>>> static void netlink_add_device(int sock, const char* type, const char* name)
>>>> {
>>>> Â netlink_add_device_impl(type, name);
>>>> Â netlink_done();
>>>> Â int err = netlink_send(sock);
>>>> Â (void)err;
>>>> }
>>>>
>>>> static void netlink_add_veth(int sock, const char* name, const char* peer)
>>>> {
>>>> Â netlink_add_device_impl("veth", name);
>>>> Â netlink_nest(IFLA_INFO_DATA);
>>>> Â netlink_nest(VETH_INFO_PEER);
>>>> Â nlmsg.pos += sizeof(struct ifinfomsg);
>>>> Â netlink_attr(IFLA_IFNAME, peer, strlen(peer));
>>>> Â netlink_done();
>>>> Â netlink_done();
>>>> Â netlink_done();
>>>> Â int err = netlink_send(sock);
>>>> Â (void)err;
>>>> }
>>>>
>>>> static void netlink_add_hsr(int sock, const char* name, const char* slave1,
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ const char* slave2)
>>>> {
>>>> Â netlink_add_device_impl("hsr", name);
>>>> Â netlink_nest(IFLA_INFO_DATA);
>>>> Â int ifindex1 = if_nametoindex(slave1);
>>>> Â netlink_attr(IFLA_HSR_SLAVE1, &ifindex1, sizeof(ifindex1));
>>>> Â int ifindex2 = if_nametoindex(slave2);
>>>> Â netlink_attr(IFLA_HSR_SLAVE2, &ifindex2, sizeof(ifindex2));
>>>> Â netlink_done();
>>>> Â netlink_done();
>>>> Â int err = netlink_send(sock);
>>>> Â (void)err;
>>>> }
>>>>
>>>> static void netlink_device_change(int sock, const char* name, bool up,
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ const char* master, const void* mac,
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int macsize)
>>>> {
>>>> Â struct ifinfomsg hdr;
>>>> Â memset(&hdr, 0, sizeof(hdr));
>>>> Â if (up)
>>>> ÂÂÂ hdr.ifi_flags = hdr.ifi_change = IFF_UP;
>>>> Â netlink_init(RTM_NEWLINK, 0, &hdr, sizeof(hdr));
>>>> Â netlink_attr(IFLA_IFNAME, name, strlen(name));
>>>> Â if (master) {
>>>> ÂÂÂ int ifindex = if_nametoindex(master);
>>>> ÂÂÂ netlink_attr(IFLA_MASTER, &ifindex, sizeof(ifindex));
>>>> Â }
>>>> Â if (macsize)
>>>> ÂÂÂ netlink_attr(IFLA_ADDRESS, mac, macsize);
>>>> Â int err = netlink_send(sock);
>>>> Â (void)err;
>>>> }
>>>>
>>>> static int netlink_add_addr(int sock, const char* dev, const void* addr,
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int addrsize)
>>>> {
>>>> Â struct ifaddrmsg hdr;
>>>> Â memset(&hdr, 0, sizeof(hdr));
>>>> Â hdr.ifa_family = addrsize == 4 ? AF_INET : AF_INET6;
>>>> Â hdr.ifa_prefixlen = addrsize == 4 ? 24 : 120;
>>>> Â hdr.ifa_scope = RT_SCOPE_UNIVERSE;
>>>> Â hdr.ifa_index = if_nametoindex(dev);
>>>> Â netlink_init(RTM_NEWADDR, NLM_F_CREATE | NLM_F_REPLACE, &hdr,
>>>> sizeof(hdr));
>>>> Â netlink_attr(IFA_LOCAL, addr, addrsize);
>>>> Â netlink_attr(IFA_ADDRESS, addr, addrsize);
>>>> Â return netlink_send(sock);
>>>> }
>>>>
>>>> static void netlink_add_addr4(int sock, const char* dev, const char* addr)
>>>> {
>>>> Â struct in_addr in_addr;
>>>> Â inet_pton(AF_INET, addr, &in_addr);
>>>> Â int err = netlink_add_addr(sock, dev, &in_addr, sizeof(in_addr));
>>>> Â (void)err;
>>>> }
>>>>
>>>> static void netlink_add_addr6(int sock, const char* dev, const char* addr)
>>>> {
>>>> Â struct in6_addr in6_addr;
>>>> Â inet_pton(AF_INET6, addr, &in6_addr);
>>>> Â int err = netlink_add_addr(sock, dev, &in6_addr, sizeof(in6_addr));
>>>> Â (void)err;
>>>> }
>>>>
>>>> #define DEV_IPV4 "172.20.20.%d"
>>>> #define DEV_IPV6 "fe80::%02hx"
>>>> #define DEV_MAC 0x00aaaaaaaaaa
>>>> static void initialize_netdevices(void)
>>>> {
>>>> Â char netdevsim[16];
>>>> Â sprintf(netdevsim, "netdevsim%d", (int)procid);
>>>> Â struct {
>>>> ÂÂÂ const char* type;
>>>> ÂÂÂ const char* dev;
>>>> Â } devtypes[] = {
>>>> ÂÂÂÂÂ {"ip6gretap", "ip6gretap0"}, {"bridge", "bridge0"},
>>>> ÂÂÂÂÂ {"vcan", "vcan0"},ÂÂÂÂÂÂÂÂÂÂ {"bond", "bond0"},
>>>> ÂÂÂÂÂ {"team", "team0"},ÂÂÂÂÂÂÂÂÂÂ {"dummy", "dummy0"},
>>>> ÂÂÂÂÂ {"nlmon", "nlmon0"},ÂÂÂÂÂÂÂÂ {"caif", "caif0"},
>>>> ÂÂÂÂÂ {"batadv", "batadv0"},ÂÂÂÂÂÂ {"vxcan", "vxcan1"},
>>>> ÂÂÂÂÂ {"netdevsim", netdevsim},ÂÂÂ {"veth", 0},
>>>> Â };
>>>> Â const char* devmasters[] = {"bridge", "bond", "team"};
>>>> Â struct {
>>>> ÂÂÂ const char* name;
>>>> ÂÂÂ int macsize;
>>>> ÂÂÂ bool noipv6;
>>>> Â } devices[] = {
>>>> ÂÂÂÂÂ {"lo", ETH_ALEN},
>>>> ÂÂÂÂÂ {"sit0", 0},
>>>> ÂÂÂÂÂ {"bridge0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"vcan0", 0, true},
>>>> ÂÂÂÂÂ {"tunl0", 0},
>>>> ÂÂÂÂÂ {"gre0", 0},
>>>> ÂÂÂÂÂ {"gretap0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"ip_vti0", 0},
>>>> ÂÂÂÂÂ {"ip6_vti0", 0},
>>>> ÂÂÂÂÂ {"ip6tnl0", 0},
>>>> ÂÂÂÂÂ {"ip6gre0", 0},
>>>> ÂÂÂÂÂ {"ip6gretap0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"erspan0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"bond0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth1", ETH_ALEN},
>>>> ÂÂÂÂÂ {"team0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth0_to_bridge", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth1_to_bridge", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth0_to_bond", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth1_to_bond", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth0_to_team", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth1_to_team", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth0_to_hsr", ETH_ALEN},
>>>> ÂÂÂÂÂ {"veth1_to_hsr", ETH_ALEN},
>>>> ÂÂÂÂÂ {"hsr0", 0},
>>>> ÂÂÂÂÂ {"dummy0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"nlmon0", 0},
>>>> ÂÂÂÂÂ {"vxcan1", 0, true},
>>>> ÂÂÂÂÂ {"caif0", ETH_ALEN},
>>>> ÂÂÂÂÂ {"batadv0", ETH_ALEN},
>>>> ÂÂÂÂÂ {netdevsim, ETH_ALEN},
>>>> Â };
>>>> Â int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
>>>> Â if (sock == -1)
>>>> ÂÂÂ exit(1);
>>>> Â unsigned i;
>>>> Â for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++)
>>>> ÂÂÂ netlink_add_device(sock, devtypes[i].type, devtypes[i].dev);
>>>> Â for (i = 0; i < sizeof(devmasters) / (sizeof(devmasters[0])); i++) {
>>>> ÂÂÂ char master[32], slave0[32], veth0[32], slave1[32], veth1[32];
>>>> ÂÂÂ sprintf(slave0, "%s_slave_0", devmasters[i]);
>>>> ÂÂÂ sprintf(veth0, "veth0_to_%s", devmasters[i]);
>>>> ÂÂÂ netlink_add_veth(sock, slave0, veth0);
>>>> ÂÂÂ sprintf(slave1, "%s_slave_1", devmasters[i]);
>>>> ÂÂÂ sprintf(veth1, "veth1_to_%s", devmasters[i]);
>>>> ÂÂÂ netlink_add_veth(sock, slave1, veth1);
>>>> ÂÂÂ sprintf(master, "%s0", devmasters[i]);
>>>> ÂÂÂ netlink_device_change(sock, slave0, false, master, 0, 0);
>>>> ÂÂÂ netlink_device_change(sock, slave1, false, master, 0, 0);
>>>> Â }
>>>> Â netlink_device_change(sock, "bridge_slave_0", true, 0, 0, 0);
>>>> Â netlink_device_change(sock, "bridge_slave_1", true, 0, 0, 0);
>>>> Â netlink_add_veth(sock, "hsr_slave_0", "veth0_to_hsr");
>>>> Â netlink_add_veth(sock, "hsr_slave_1", "veth1_to_hsr");
>>>> Â netlink_add_hsr(sock, "hsr0", "hsr_slave_0", "hsr_slave_1");
>>>> Â netlink_device_change(sock, "hsr_slave_0", true, 0, 0, 0);
>>>> Â netlink_device_change(sock, "hsr_slave_1", true, 0, 0, 0);
>>>> Â for (i = 0; i < sizeof(devices) / (sizeof(devices[0])); i++) {
>>>> ÂÂÂ char addr[32];
>>>> ÂÂÂ sprintf(addr, DEV_IPV4, i + 10);
>>>> ÂÂÂ netlink_add_addr4(sock, devices[i].name, addr);
>>>> ÂÂÂ if (!devices[i].noipv6) {
>>>> ÂÂÂÂÂ sprintf(addr, DEV_IPV6, i + 10);
>>>> ÂÂÂÂÂ netlink_add_addr6(sock, devices[i].name, addr);
>>>> ÂÂÂ }
>>>> ÂÂÂ uint64_t macaddr = DEV_MAC + ((i + 10ull) << 40);
>>>> ÂÂÂ netlink_device_change(sock, devices[i].name, true, 0, &macaddr,
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ devices[i].macsize);
>>>> Â }
>>>> Â close(sock);
>>>> }
>>>> static void initialize_netdevices_init(void)
>>>> {
>>>> Â int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
>>>> Â if (sock == -1)
>>>> ÂÂÂ exit(1);
>>>> Â struct {
>>>> ÂÂÂ const char* type;
>>>> ÂÂÂ int macsize;
>>>> ÂÂÂ bool noipv6;
>>>> ÂÂÂ bool noup;
>>>> Â } devtypes[] = {
>>>> ÂÂÂÂÂ {"nr", 7, true}, {"rose", 5, true, true},
>>>> Â };
>>>> Â unsigned i;
>>>> Â for (i = 0; i < sizeof(devtypes) / sizeof(devtypes[0]); i++) {
>>>> ÂÂÂ char dev[32], addr[32];
>>>> ÂÂÂ sprintf(dev, "%s%d", devtypes[i].type, (int)procid);
>>>> ÂÂÂ sprintf(addr, "172.30.%d.%d", i, (int)procid + 1);
>>>> ÂÂÂ netlink_add_addr4(sock, dev, addr);
>>>> ÂÂÂ if (!devtypes[i].noipv6) {
>>>> ÂÂÂÂÂ sprintf(addr, "fe88::%02hx:%02hx", i, (int)procid + 1);
>>>> ÂÂÂÂÂ netlink_add_addr6(sock, dev, addr);
>>>> ÂÂÂ }
>>>> ÂÂÂ int macsize = devtypes[i].macsize;
>>>> ÂÂÂ uint64_t macaddr = 0xbbbbbb +
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ((unsigned long long)i << (8 * (macsize - 2))) +
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ (procid << (8 * (macsize - 1)));
>>>> ÂÂÂ netlink_device_change(sock, dev, !devtypes[i].noup, 0, &macaddr,
>>>> macsize);
>>>> Â }
>>>> Â close(sock);
>>>> }
>>>>
>>>> static void setup_common()
>>>> {
>>>> Â if (mount(0, "/sys/fs/fuse/connections", "fusectl", 0, 0)) {
>>>> Â }
>>>> }
>>>>
>>>> static void loop();
>>>>
>>>> static void sandbox_common()
>>>> {
>>>> Â prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
>>>> Â setpgrp();
>>>> Â setsid();
>>>> Â struct rlimit rlim;
>>>> Â rlim.rlim_cur = rlim.rlim_max = 200 << 20;
>>>> Â setrlimit(RLIMIT_AS, &rlim);
>>>> Â rlim.rlim_cur = rlim.rlim_max = 32 << 20;
>>>> Â setrlimit(RLIMIT_MEMLOCK, &rlim);
>>>> Â rlim.rlim_cur = rlim.rlim_max = 136 << 20;
>>>> Â setrlimit(RLIMIT_FSIZE, &rlim);
>>>> Â rlim.rlim_cur = rlim.rlim_max = 1 << 20;
>>>> Â setrlimit(RLIMIT_STACK, &rlim);
>>>> Â rlim.rlim_cur = rlim.rlim_max = 0;
>>>> Â setrlimit(RLIMIT_CORE, &rlim);
>>>> Â rlim.rlim_cur = rlim.rlim_max = 256;
>>>> Â setrlimit(RLIMIT_NOFILE, &rlim);
>>>> Â if (unshare(CLONE_NEWNS)) {
>>>> Â }
>>>> Â if (unshare(CLONE_NEWIPC)) {
>>>> Â }
>>>> Â if (unshare(0x02000000)) {
>>>> Â }
>>>> Â if (unshare(CLONE_NEWUTS)) {
>>>> Â }
>>>> Â if (unshare(CLONE_SYSVSEM)) {
>>>> Â }
>>>> Â typedef struct {
>>>> ÂÂÂ const char* name;
>>>> ÂÂÂ const char* value;
>>>> Â } sysctl_t;
>>>> Â static const sysctl_t sysctls[] = {
>>>> ÂÂÂÂÂ {"/proc/sys/kernel/shmmax", "16777216"},
>>>> ÂÂÂÂÂ {"/proc/sys/kernel/shmall", "536870912"},
>>>> ÂÂÂÂÂ {"/proc/sys/kernel/shmmni", "1024"},
>>>> ÂÂÂÂÂ {"/proc/sys/kernel/msgmax", "8192"},
>>>> ÂÂÂÂÂ {"/proc/sys/kernel/msgmni", "1024"},
>>>> ÂÂÂÂÂ {"/proc/sys/kernel/msgmnb", "1024"},
>>>> ÂÂÂÂÂ {"/proc/sys/kernel/sem", "1024 1048576 500 1024"},
>>>> Â };
>>>> Â unsigned i;
>>>> Â for (i = 0; i < sizeof(sysctls) / sizeof(sysctls[0]); i++)
>>>> ÂÂÂ write_file(sysctls[i].name, sysctls[i].value);
>>>> }
>>>>
>>>> int wait_for_loop(int pid)
>>>> {
>>>> Â if (pid < 0)
>>>> ÂÂÂ exit(1);
>>>> Â int status = 0;
>>>> Â while (waitpid(-1, &status, __WALL) != pid) {
>>>> Â }
>>>> Â return WEXITSTATUS(status);
>>>> }
>>>>
>>>> static int do_sandbox_none(void)
>>>> {
>>>> Â if (unshare(CLONE_NEWPID)) {
>>>> Â }
>>>> Â int pid = fork();
>>>> Â if (pid != 0)
>>>> ÂÂÂ return wait_for_loop(pid);
>>>> Â setup_common();
>>>> Â sandbox_common();
>>>> Â initialize_netdevices_init();
>>>> Â if (unshare(CLONE_NEWNET)) {
>>>> Â }
>>>> Â initialize_netdevices();
>>>> Â loop();
>>>> Â exit(1);
>>>> }
>>>>
>>>> #define FS_IOC_SETFLAGS _IOW('f', 2, long)
>>>> static void remove_dir(const char* dir)
>>>> {
>>>> Â DIR* dp;
>>>> Â struct dirent* ep;
>>>> Â int iter = 0;
>>>> retry:
>>>> Â while (umount2(dir, MNT_DETACH) == 0) {
>>>> Â }
>>>> Â dp = opendir(dir);
>>>> Â if (dp == NULL) {
>>>> ÂÂÂ if (errno == EMFILE) {
>>>> ÂÂÂÂÂ exit(1);
>>>> ÂÂÂ }
>>>> ÂÂÂ exit(1);
>>>> Â }
>>>> Â while ((ep = readdir(dp))) {
>>>> ÂÂÂ if (strcmp(ep->d_name, ".") == 0 || strcmp(ep->d_name, "..") == 0)
>>>> ÂÂÂÂÂ continue;
>>>> ÂÂÂ char filename[FILENAME_MAX];
>>>> ÂÂÂ snprintf(filename, sizeof(filename), "%s/%s", dir, ep->d_name);
>>>> ÂÂÂ while (umount2(filename, MNT_DETACH) == 0) {
>>>> ÂÂÂ }
>>>> ÂÂÂ struct stat st;
>>>> ÂÂÂ if (lstat(filename, &st))
>>>> ÂÂÂÂÂ exit(1);
>>>> ÂÂÂ if (S_ISDIR(st.st_mode)) {
>>>> ÂÂÂÂÂ remove_dir(filename);
>>>> ÂÂÂÂÂ continue;
>>>> ÂÂÂ }
>>>> ÂÂÂ int i;
>>>> ÂÂÂ for (i = 0;; i++) {
>>>> ÂÂÂÂÂ if (unlink(filename) == 0)
>>>> ÂÂÂÂÂÂÂ break;
>>>> ÂÂÂÂÂ if (errno == EPERM) {
>>>> ÂÂÂÂÂÂÂ int fd = open(filename, O_RDONLY);
>>>> ÂÂÂÂÂÂÂ if (fd != -1) {
>>>> ÂÂÂÂÂÂÂÂÂ long flags = 0;
>>>> ÂÂÂÂÂÂÂÂÂ if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0)
>>>> ÂÂÂÂÂÂÂÂÂÂÂ close(fd);
>>>> ÂÂÂÂÂÂÂÂÂ continue;
>>>> ÂÂÂÂÂÂÂ }
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂÂÂ if (errno == EROFS) {
>>>> ÂÂÂÂÂÂÂ break;
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂÂÂ if (errno != EBUSY || i > 100)
>>>> ÂÂÂÂÂÂÂ exit(1);
>>>> ÂÂÂÂÂ if (umount2(filename, MNT_DETACH))
>>>> ÂÂÂÂÂÂÂ exit(1);
>>>> ÂÂÂ }
>>>> Â }
>>>> Â closedir(dp);
>>>> Â int i;
>>>> Â for (i = 0;; i++) {
>>>> ÂÂÂ if (rmdir(dir) == 0)
>>>> ÂÂÂÂÂ break;
>>>> ÂÂÂ if (i < 100) {
>>>> ÂÂÂÂÂ if (errno == EPERM) {
>>>> ÂÂÂÂÂÂÂ int fd = open(dir, O_RDONLY);
>>>> ÂÂÂÂÂÂÂ if (fd != -1) {
>>>> ÂÂÂÂÂÂÂÂÂ long flags = 0;
>>>> ÂÂÂÂÂÂÂÂÂ if (ioctl(fd, FS_IOC_SETFLAGS, &flags) == 0)
>>>> ÂÂÂÂÂÂÂÂÂÂÂ close(fd);
>>>> ÂÂÂÂÂÂÂÂÂ continue;
>>>> ÂÂÂÂÂÂÂ }
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂÂÂ if (errno == EROFS) {
>>>> ÂÂÂÂÂÂÂ break;
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂÂÂ if (errno == EBUSY) {
>>>> ÂÂÂÂÂÂÂ if (umount2(dir, MNT_DETACH))
>>>> ÂÂÂÂÂÂÂÂÂ exit(1);
>>>> ÂÂÂÂÂÂÂ continue;
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂÂÂ if (errno == ENOTEMPTY) {
>>>> ÂÂÂÂÂÂÂ if (iter < 100) {
>>>> ÂÂÂÂÂÂÂÂÂ iter++;
>>>> ÂÂÂÂÂÂÂÂÂ goto retry;
>>>> ÂÂÂÂÂÂÂ }
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂ }
>>>> ÂÂÂ exit(1);
>>>> Â }
>>>> }
>>>>
>>>> static void kill_and_wait(int pid, int* status)
>>>> {
>>>> Â kill(-pid, SIGKILL);
>>>> Â kill(pid, SIGKILL);
>>>> Â int i;
>>>> Â for (i = 0; i < 100; i++) {
>>>> ÂÂÂ if (waitpid(-1, status, WNOHANG | __WALL) == pid)
>>>> ÂÂÂÂÂ return;
>>>> ÂÂÂ usleep(1000);
>>>> Â }
>>>> Â DIR* dir = opendir("/sys/fs/fuse/connections");
>>>> Â if (dir) {
>>>> ÂÂÂ for (;;) {
>>>> ÂÂÂÂÂ struct dirent* ent = readdir(dir);
>>>> ÂÂÂÂÂ if (!ent)
>>>> ÂÂÂÂÂÂÂ break;
>>>> ÂÂÂÂÂ if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
>>>> ÂÂÂÂÂÂÂ continue;
>>>> ÂÂÂÂÂ char abort[300];
>>>> ÂÂÂÂÂ snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ent->d_name);
>>>> ÂÂÂÂÂ int fd = open(abort, O_WRONLY);
>>>> ÂÂÂÂÂ if (fd == -1) {
>>>> ÂÂÂÂÂÂÂ continue;
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂÂÂ if (write(fd, abort, 1) < 0) {
>>>> ÂÂÂÂÂ }
>>>> ÂÂÂÂÂ close(fd);
>>>> ÂÂÂ }
>>>> ÂÂÂ closedir(dir);
>>>> Â } else {
>>>> Â }
>>>> Â while (waitpid(-1, status, __WALL) != pid) {
>>>> Â }
>>>> }
>>>>
>>>> #define SYZ_HAVE_SETUP_TEST 1
>>>> static void setup_test()
>>>> {
>>>> Â prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
>>>> Â setpgrp();
>>>> }
>>>>
>>>> #define SYZ_HAVE_RESET_TEST 1
>>>> static void reset_test()
>>>> {
>>>> Â int fd;
>>>> Â for (fd = 3; fd < 30; fd++)
>>>> ÂÂÂ close(fd);
>>>> }
>>>>
>>>> static void execute_one(void);
>>>>
>>>> #define WAIT_FLAGS __WALL
>>>>
>>>> static void loop(void)
>>>> {
>>>> Â int iter;
>>>> Â for (iter = 0;; iter++) {
>>>> ÂÂÂ char cwdbuf[32];
>>>> ÂÂÂ sprintf(cwdbuf, "./%d", iter);
>>>> ÂÂÂ if (mkdir(cwdbuf, 0777))
>>>> ÂÂÂÂÂ exit(1);
>>>> ÂÂÂ int pid = fork();
>>>> ÂÂÂ if (pid < 0)
>>>> ÂÂÂÂÂ exit(1);
>>>> ÂÂÂ if (pid == 0) {
>>>> ÂÂÂÂÂ if (chdir(cwdbuf))
>>>> ÂÂÂÂÂÂÂ exit(1);
>>>> ÂÂÂÂÂ setup_test();
>>>> ÂÂÂÂÂ execute_one();
>>>> ÂÂÂÂÂ reset_test();
>>>> ÂÂÂÂÂ exit(0);
>>>> ÂÂÂ }
>>>> ÂÂÂ int status = 0;
>>>> ÂÂÂ uint64_t start = current_time_ms();
>>>> ÂÂÂ for (;;) {
>>>> ÂÂÂÂÂ if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
>>>> ÂÂÂÂÂÂÂ break;
>>>> ÂÂÂÂÂ sleep_ms(1);
>>>> ÂÂÂÂÂ if (current_time_ms() - start < 5 * 1000)
>>>> ÂÂÂÂÂÂÂ continue;
>>>> ÂÂÂÂÂ kill_and_wait(pid, &status);
>>>> ÂÂÂÂÂ break;
>>>> ÂÂÂ }
>>>> ÂÂÂ remove_dir(cwdbuf);
>>>> Â }
>>>> }
>>>>
>>>> void execute_one(void)
>>>> {
>>>> Â syscall(__NR_unshare, 0x40000000);
>>>> }
>>>> int main(void)
>>>> {
>>>> Â syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
>>>> Â for (procid = 0; procid < 8; procid++) {
>>>> ÂÂÂ if (fork() == 0) {
>>>> ÂÂÂÂÂ use_temporary_dir();
>>>> ÂÂÂÂÂ do_sandbox_none();
>>>> ÂÂÂ }
>>>> Â }
>>>> Â sleep(1000000);
>>>> Â return 0;
>>>> }
>>>>
>>>>
>>>> I reviewed kernel code and found a bug that
>>>> net_drop_ns func doesn't call net_free func when refcount_dec_and_test's
>>>> return value is zero.
>>> Yes. We don't call net_free when the reference count does not decrement
>>> to zero. The reference count is initialized to 1 a few lines above the
>>> section of code in your patch so that should not be a problem.
>>>
>>>> or
>>>> when rv = down_read_killable(&pernet_ops_rwsem) < 0, it doesn't need to
>>>> call refcount_dec_and_test.
>>> It doesn't need to but it should be harmless.
>>>
>>>> https://github.com/torvalds/linux/commit/5ba049a5cc8e24a1643df75bbf65b4efa070fa74#diff-9312644e2968a45510bacdd2b2872ad2
>>>> (I can't reproduce this bug on v4.15 , and
>>>> 1bdbe227492075d058e37cb3d400e6468d0095b5 with my patch. Because of the
>>>> previous version of kernel doesn't have this bug.)
>>>> This bug can lead to memory leak or DOS.
>>>>
>>>> I made a patch for this bug. (just revert to a before commit)
>>> What am I missing?
>>>
>>> The only thing I can see your patch doing is covering up a memory stomp
>>> that has the effect of changing the value of net->passive. I am not
>>> really keen on hiding bugs of that kind.
>>>
>>>
>>>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>>>> index b02fb19df2cc..9de0ade14956 100644
>>>> --- a/net/core/net_namespace.c
>>>> +++ b/net/core/net_namespace.c
>>>> @@ -431,15 +431,18 @@ struct net *copy_net_ns(unsigned long flags,
>>>> ÂÂÂÂÂÂÂ get_user_ns(user_ns);
>>>>
>>>> ÂÂÂÂÂÂÂ rv = down_read_killable(&pernet_ops_rwsem);
>>>> -ÂÂÂÂÂÂ if (rv < 0)
>>>> -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto put_userns;
>>>> +ÂÂÂÂÂÂ if (rv < 0){
>>>> +ÂÂÂÂÂÂÂ net_free(net);
>>>> +ÂÂÂÂÂÂÂ dec_net_namespaces(ucounts);
>>>> +ÂÂÂÂÂÂÂ put_user_ns(user_ns);
>>>> +ÂÂÂÂÂÂÂ return ERR_PTR(rv);
>>>> +ÂÂÂ }
>>>>
>>>> ÂÂÂÂÂÂÂ rv = setup_net(net, user_ns);
>>>>
>>>> ÂÂÂÂÂÂÂ up_read(&pernet_ops_rwsem);
>>>>
>>>> ÂÂÂÂÂÂÂ if (rv < 0) {
>>>> -put_userns:
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ put_user_ns(user_ns);
>>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ net_drop_ns(net);
>>>> Âdec_ucounts:
>>>>
>>>> and, sorry for my encrypted mails.
>>> Eric
>>>