Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

From: Sasha Levin
Date: Thu Apr 05 2018 - 15:23:06 EST


On Thu, Apr 05, 2018 at 09:49:40AM -0400, Pavel Tatashin wrote:
>> Hi Sasha,
>>
>> I have registered on Azure's portal, and created a VM with 4 CPUs and 16G
>> of RAM. However, I still was not able to reproduce the boot bug you found.
>
>I have also tried to reproduce this issue on Windows 10 + Hyper-V, still
>unsuccessful.

I'm not sure why you can't reproduce it. I built a 4.16 kernel + your 6
patches on top, and booting on a D64s_v3 instance gives me this:

[ 1.205726] page:ffffea0084000000 is uninitialized and poisoned
[ 1.205737] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1.207016] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 1.208014] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[ 1.209087] ------------[ cut here ]------------
[ 1.210000] kernel BUG at ./include/linux/mm.h:901!
[ 1.210015] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1.211000] Modules linked in:
[ 1.211000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #10
[ 1.211000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 1.211000] RIP: 0010:get_nid_for_pfn+0x6e/0xa0
[ 1.211000] RSP: 0000:ffff881c63cbfc28 EFLAGS: 00010246
[ 1.211000] RAX: 0000000000000000 RBX: ffffea0084000000 RCX: 0000000000000000
[ 1.211000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffed038c797f78
[ 1.211000] RBP: ffff881c63cbfc30 R08: ffff88401174a480 R09: 0000000000000000
[ 1.211000] R10: ffff8840e00d6040 R11: 0000000000000000 R12: 0000000002107fff
[ 1.211000] R13: fffffbfff4648234 R14: 0000000000000001 R15: 0000000000000001
[ 1.211000] FS: 0000000000000000(0000) GS:ffff881c6aa00000(0000) knlGS:0000000000000000
[ 1.211000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.211000] CR2: 0000000000000000 CR3: 0000002814216000 CR4: 00000000003406f0
[ 1.211000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.211000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.211000] Call Trace:
[ 1.211000] register_mem_sect_under_node+0x1a2/0x530
[ 1.211000] link_mem_sections+0x12d/0x200
[ 1.211000] topology_init+0xe6/0x178
[ 1.211000] ? enable_cpu0_hotplug+0x1a/0x1a
[ 1.211000] do_one_initcall+0xb0/0x31f
[ 1.211000] ? initcall_blacklisted+0x220/0x220
[ 1.211000] ? up_write+0x78/0x140
[ 1.211000] ? up_read+0x40/0x40
[ 1.211000] ? __asan_register_globals+0x30/0xa0
[ 1.211000] ? kasan_unpoison_shadow+0x35/0x50
[ 1.211000] kernel_init_freeable+0x69d/0x764
[ 1.211000] ? start_kernel+0x8fd/0x8fd
[ 1.211000] ? finish_task_switch+0x1b6/0x9c0
[ 1.211000] ? rest_init+0x120/0x120
[ 1.211000] kernel_init+0x13/0x150
[ 1.211000] ? rest_init+0x120/0x120
[ 1.211000] ret_from_fork+0x3a/0x50
[ 1.211000] Code: ff df 48 c1 ea 03 80 3c 02 00 75 34 48 8b 03 48 83 f8 ff 74 07 48 c1 e8 36 5b 5d c3 48 c7 c6 00 ca f5 9e 48 89 df e8 82 13 d5 fd <0f> 0b 48 c7 c7 00 24 2e a1 e8 05 36 c1 fe e8 af 07 ea fd eb ac
[ 1.211000] RIP: get_nid_for_pfn+0x6e/0xa0 RSP: ffff881c63cbfc28
[ 1.211017] ---[ end trace d86a03841f7ef229 ]---
[ 1.212020] ==================================================================
[ 1.213000] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x64c/0x810
[ 1.213000] Read of size 8 at addr ffff881c63cbfaf8 by task swapper/0/1
[ 1.213000]
[ 1.213000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D 4.16.0+ #10
[ 1.213000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 1.213000] Call Trace:
[ 1.213000] dump_stack+0xe3/0x196
[ 1.213000] ? _atomic_dec_and_lock+0x31a/0x31a
[ 1.213000] ? vprintk_func+0x27/0x60
[ 1.213000] ? printk+0x9c/0xc3
[ 1.213000] ? show_regs_print_info+0x10/0x10
[ 1.213000] ? lock_acquire+0x760/0x760
[ 1.213000] ? update_stack_state+0x64c/0x810
[ 1.213000] print_address_description+0xe4/0x480
[ 1.213000] ? update_stack_state+0x64c/0x810
[ 1.213000] kasan_report+0x1d7/0x460
[ 1.213000] ? console_unlock+0x652/0xe90
[ 1.213000] ? update_stack_state+0x64c/0x810
[ 1.213000] __asan_report_load8_noabort+0x19/0x20
[ 1.213000] update_stack_state+0x64c/0x810
[ 1.213000] ? __read_once_size_nocheck.constprop.2+0x50/0x50
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] ? unwind_next_frame+0x202/0x1230
[ 1.213000] unwind_next_frame+0x202/0x1230
[ 1.213000] ? unwind_dump+0x590/0x590
[ 1.213000] ? get_stack_info+0x42/0x3b0
[ 1.213000] ? debug_check_no_locks_freed+0x300/0x300
[ 1.213000] ? __unwind_start+0x170/0x380
[ 1.213000] __save_stack_trace+0x82/0x140
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] save_stack_trace+0x39/0x70
[ 1.213000] save_stack+0x43/0xd0
[ 1.213000] ? save_stack+0x43/0xd0
[ 1.213000] ? __kasan_slab_free+0x11f/0x170
[ 1.213000] ? kasan_slab_free+0xe/0x10
[ 1.213000] ? kmem_cache_free+0xe6/0x560
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] ? _get_random_bytes+0x162/0x5a0
[ 1.213000] ? trace_hardirqs_off+0xd/0x10
[ 1.213000] ? lock_acquire+0x212/0x760
[ 1.213000] ? rcuwait_wake_up+0x15e/0x2c0
[ 1.213000] ? lock_acquire+0x212/0x760
[ 1.213000] ? free_obj_work+0x8a0/0x8a0
[ 1.213000] ? lock_acquire+0x212/0x760
[ 1.213000] ? acct_collect+0x776/0xe80
[ 1.213000] ? acct_collect+0x2e4/0xe80
[ 1.213000] ? acct_collect+0x2e4/0xe80
[ 1.213000] ? lock_acquire+0x760/0x760
[ 1.213000] ? lock_downgrade+0x910/0x910
[ 1.213000] __kasan_slab_free+0x11f/0x170
[ 1.213000] ? put_files_struct+0x2a4/0x390
[ 1.213000] kasan_slab_free+0xe/0x10
[ 1.213000] kmem_cache_free+0xe6/0x560
[ 1.213000] put_files_struct+0x2a4/0x390
[ 1.213000] ? get_files_struct+0x80/0x80
[ 1.213000] ? do_raw_spin_trylock+0x1f0/0x1f0
[ 1.213000] exit_files+0x83/0xc0
[ 1.213000] do_exit+0x9be/0x2190
[ 1.213000] ? do_invalid_op+0x20/0x30
[ 1.213000] ? mm_update_next_owner+0x1200/0x1200
[ 1.213000] ? get_nid_for_pfn+0x6e/0xa0
[ 1.213000] ? get_nid_for_pfn+0x6e/0xa0
[ 1.213000] ? register_mem_sect_under_node+0x1a2/0x530
[ 1.213000] ? link_mem_sections+0x12d/0x200
[ 1.213000] ? topology_init+0xe6/0x178
[ 1.213000] ? enable_cpu0_hotplug+0x1a/0x1a
[ 1.213000] ? do_one_initcall+0xb0/0x31f
[ 1.213000] ? initcall_blacklisted+0x220/0x220
[ 1.213000] ? up_write+0x78/0x140
[ 1.213000] ? up_read+0x40/0x40
[ 1.213000] ? __asan_register_globals+0x30/0xa0
[ 1.213000] ? kasan_unpoison_shadow+0x35/0x50
[ 1.213000] ? kernel_init_freeable+0x69d/0x764
[ 1.213000] ? start_kernel+0x8fd/0x8fd
[ 1.213000] ? finish_task_switch+0x1b6/0x9c0
[ 1.213000] ? rest_init+0x120/0x120
[ 1.213000] rewind_stack_do_exit+0x17/0x20
[ 1.213000]
[ 1.213000] The buggy address belongs to the page:
[ 1.213000] page:ffffea00718f2fc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 1.213000] flags: 0x17ffffc0000000()
[ 1.213000] raw: 0017ffffc0000000 0000000000000000 0000000000000000 00000000ffffffff
[ 1.213000] raw: ffffea00718f2fe0 ffffea00718f2fe0 0000000000000000 0000000000000000
[ 1.213000] page dumped because: kasan: bad access detected
[ 1.213000]
[ 1.213000] Memory state around the buggy address:
[ 1.213000] ffff881c63cbf980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1.213000] ffff881c63cbfa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
[ 1.213000] >ffff881c63cbfa80: f1 f8 f2 f2 f2 00 00 00 00 00 00 00 00 00 f3 f3
[ 1.213000] ^
[ 1.213000] ffff881c63cbfb00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1.213000] ffff881c63cbfb80: f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2
[ 1.213000] ==================================================================
[ 1.213033] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1.213033]
[ 1.214000] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b