Re: [linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h

From: Vlastimil Babka
Date: Thu Jan 12 2023 - 04:29:53 EST


On 1/12/23 08:47, Oliver Sang wrote:
> hi, Vlastimil,
>
> On Tue, Jan 10, 2023 at 03:09:36PM +0100, Vlastimil Babka wrote:
>> On 1/10/23 14:53, Oliver Sang wrote:
>> > hi all,
>> >
>> > On Mon, Jan 09, 2023 at 10:01:15PM +0800, Oliver Sang wrote:
>> >>
>> >> On Fri, Jan 06, 2023 at 11:13:15AM +0100, Vlastimil Babka wrote:
>> >>
>> >> > And if the rate at the parent (has it increased thanks to the
>> >> > DEBUG_PAGEALLOC?) is sufficient to bisect to the truly first bad commit. Thanks!
>> >>
>> >> got it. Thanks for suggestion!
>> >>
>> >> since 0af8489b02 is based on v6.1-rc2, we will test (both rectorture and boot)
>> >> with same config upon v6.1-rc2 to see if it's really clean there.
>> >> if so we will use dmesg.invalid_opcode:#[##] to trigger new bisect.
>> >>
>> >> will keep you updated. Thanks
>> >
>> > by more tests, we cannot make sure the v6.1-rc2 is clean, so we also checked
>> > v6.1-rc1 and v6.0. from results, we have low confidence that we can make a
>> > successful bisection based on them [1][2]. could you suggest?
>>
>> So am I reading it right, that the probleam appears to be introduced between
>> v6.0 (0 failures) and v6.1-rc1 (>0 failures)? But agree that with such low
>> incidence, it's hard to bisect.
>
> yeah, you are reading it right :)
>
>>
>> > a further information not sure if it's helpful, [1][2] are both i386 based.
>> > we also tried to run boot tests on x86_64 upon commit 0af8489b02, whatever
>> > with or without CONFIG_DEBUG_PAGEALLOC/CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT,
>> > we never obseve similar issues (also run 999 times).
>>
>> Yeah it looks very much like something that manifests only on i386 (perhaps
>> only in QEMU as well?) and never x86_64.
>>
>> What might be interesting then is v6.1-rc1 with further modified config to
>> enabled CONFIG_SLUB_DEBUG and CONFIG_SLUB_DEBUG_ON. Maybe it will catch the
>> culprit sooner. Or maybe it will obscure the bug instead, unfortunately.
>
> oh, seems, unfortunalately, 'obscure' happen :(
>
> we enabled CONFIG_SLUB_DEBUG and CONFIG_SLUB_DEBUG_ON, along with
> CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
>
> boot (we also add the test for v6.2-rc3):
> =========================================================================================
> compiler/kconfig/rootfs/sleep/tbox_group/testcase:
> gcc-11/i386-randconfig-a012-20221226+CONFIG_DEBUG_PAGEALLOC+CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT+CONFIG_SLUB_DEBUG_ON/debian-11.1-i386-20220923.cgz/1/vm-snb/boot
>
> v6.0 v6.1-rc1 v6.1-rc2 56d5a2b9ba85a390473e86b4fe4 0af8489b0216fa1dd83e264bef8 v6.2-rc3
> ---------------- --------------------------- --------------------------- --------------------------- --------------------------- ---------------------------
> fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs
> | | | | | | | | | | |
> 43:999 3% 68:999 4% 84:999 6% 99:999 5% 94:999 4% 86:999 dmesg.invalid_opcode:#[##]
> 4:999 -0% 2:999 0% 7:999 0% 8:999 0% 4:999 -0% :999 dmesg.kernel_BUG_at_include/linux/mm.h
> 3:999 0% 4:999 0% 3:999 0% 7:999 0% 5:999 1% 9:999 dmesg.kernel_BUG_at_include/linux/page-flags.h
> 34:999 3% 61:999 4% 73:999 5% 81:999 5% 85:999 4% 74:999 dmesg.kernel_BUG_at_lib/list_debug.c
> :999 0% :999 0% :999 0% 1:999 0% :999 0% :999 dmesg.kernel_BUG_at_mm/internal.h
> 3:999 -0% 1:999 -0% :999 -0% 2:999 -0% :999 -0% 2:999 dmesg.kernel_BUG_at_mm/page_alloc.c
> :999 0% :999 0% 2:999 0% :999 0% :999 0% 2:999 dmesg.kernel_BUG_at_mm/usercopy.c
>
>
> since now even the v6.0 is not clean, attached one dmesg FYI

Interesting one, btw.

> [ 38.392164][ T193] page:e6c3e400 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x9e80
> [ 38.393441][ T193] head:e6c3e2c0 order:3 compound_mapcount:0 compound_pincount:0

Yay, struct page pointers no longer hashed for printing, thanks to slub_debug.
Head page is order-3 (8 pages), yet e6c3e400 would be 9th page, nonsense.

> [ 38.394491][ T193] flags: 0x10200(slab|head|zone=0)
> [ 38.395181][ T193] raw: 00010000 e6c3e2c1 00000122 00000400 00000000 00000000 ffffffff 00000001

We have head flag, compound_head is e6c3e2c0+PageTail, there's again
00000400 in mapping that's specifically set by prep_compound_tail().

> [ 38.396386][ T193] raw: 00000000 00000000
> [ 38.396956][ T193] head: 00010200 e6ac7904 c01007f0 c0101a40 c9e79a00 00150004 ffffffff 00000001

00150004 are slub's counters suggesting it's been used for a while, we
should not be racing with e6c3e2c0's allocation here!

> [ 38.398161][ T193] head: 00000000 00000000
> [ 38.398775][ T193] page dumped because: VM_BUG_ON_PAGE(PageTail(page))
> [ 38.399724][ T193] page_owner tracks the page as allocated
> [ 38.400525][ T193] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 193, tgid 193 (systemd-udevd), ts 38392157598, free_ts 0

This is a dump of e6c3e400, not the "head".

> [ 38.403183][ T193] post_alloc_hook+0x29a/0x320
> [ 38.403891][ T193] get_page_from_freelist+0x226/0x310
> [ 38.404644][ T193] __alloc_pages+0xdd/0x360
> [ 38.405256][ T193] alloc_slab_page+0x12d/0x200
> [ 38.405936][ T193] allocate_slab+0x6a/0x670
> [ 38.406586][ T193] new_slab+0x48/0xc0
> [ 38.407127][ T193] ___slab_alloc+0x9ba/0x1610
> [ 38.407942][ T193] __slab_alloc+0x41/0x70
> [ 38.408776][ T193] __kmalloc_track_caller+0x57b/0x710
> [ 38.409538][ T193] kmalloc_reserve+0x3f/0x100
> [ 38.410339][ T193] __alloc_skb+0xad/0x310
> [ 38.410925][ T193] netlink_sendmsg+0x384/0x6f0
> [ 38.411602][ T193] sock_sendmsg+0x79/0xe0
> [ 38.420425][ T193] ____sys_sendmsg+0x1ef/0x390
> [ 38.421069][ T193] ___sys_sendmsg+0xb0/0x120
> [ 38.421726][ T193] __sys_sendmsg+0x84/0x100
> [ 38.422372][ T193] page_owner free stack trace missing
> [ 38.423113][ T193] ------------[ cut here ]------------
> [ 38.423899][ T193] kernel BUG at include/linux/page-flags.h:319!
> [ 38.424778][ T193] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 38.425610][ T193] CPU: 1 PID: 193 Comm: systemd-udevd Tainted: G S 6.0.0 #1 b46e496fab9f0be3d12299e0bf71444aabf1c433
> [ 38.427187][ T193] EIP: folio_flags+0x31/0x70
> [ 38.427951][ T193] Code: 48 83 05 e8 89 bd c5 01 ba a4 c0 2b c4 55 89 e5 83 15 ec 89 bd c5 00 e8 8d 3f f6 ff 83 05 f8 89 bd c5 01 83 15 fc 89 bd c5 00 <0f> 0b 83 05 00 8a bd c5 01 b8 38 1e a6 c4 83 15 04 8a bd c5 00 e8
> [ 38.430464][ T193] EAX: 00000000 EBX: e6c3e400 ECX: 00000000 EDX: ee7ead00
> [ 38.431382][ T193] ESI: 00000000 EDI: 000c2cc0 EBP: c995fb30 ESP: c995fb30
> [ 38.432350][ T193] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210246
> [ 38.433348][ T193] CR0: 80050033 CR2: b5c4f018 CR3: 09cab000 CR4: 000006d0
> [ 38.434288][ T193] Call Trace:
> [ 38.434737][ T193] alloc_slab_page+0x7f/0x200

Aha, we have crashed while allocating a fresh new slab page, probably at
__folio_set_slab(folio), the page doesn't have het PageSlab() flag set.

So we have just took the page from a freelist (that modifies lru.next
overlapping compound_head, we can still see the poison in lru.prev),
verified that it's page->mapping is 0 (in page_expected_state()). Few
moments later there's a compound_head set and mapping set in a way that only
prep_compound_tail() does.
But the head that would be connected seems to have done its
prep_compound_tail()'s a while ago (nevermind that it would have to be an
off-by-one). So that just makes no sense.

> [ 38.435361][ T193] allocate_slab+0x6a/0x670
> [ 38.436007][ T193] new_slab+0x48/0xc0
> [ 38.436536][ T193] ___slab_alloc+0x9ba/0x1610
> [ 38.437293][ T193] ? __might_sleep+0x5a/0x100
> [ 38.437912][ T193] ? __alloc_skb+0xad/0x310
> [ 38.438513][ T193] ? lock_release+0xa1/0x190
> [ 38.439127][ T193] __slab_alloc+0x41/0x70
> [ 38.439845][ T193] __kmalloc_track_caller+0x57b/0x710
> [ 38.440557][ T193] ? kmem_cache_alloc+0x2d6/0x690
> [ 38.441217][ T193] ? __alloc_skb+0xad/0x310
> [ 38.441821][ T193] kmalloc_reserve+0x3f/0x100
> [ 38.442569][ T193] ? security_capable+0x40/0xd0
> [ 38.443214][ T193] __alloc_skb+0xad/0x310
> [ 38.443801][ T193] ? ns_capable_common+0x41/0x80
> [ 38.444452][ T193] netlink_sendmsg+0x384/0x6f0
> [ 38.445089][ T193] ? iovec_from_user+0xea/0x1b0
> [ 38.445737][ T193] sock_sendmsg+0x79/0xe0
> [ 38.446313][ T193] ____sys_sendmsg+0x1ef/0x390
> [ 38.446955][ T193] ___sys_sendmsg+0xb0/0x120
> [ 38.447586][ T193] __sys_sendmsg+0x84/0x100
> [ 38.448225][ T193] __do_sys_socketcall+0x20d/0x5e0
> [ 38.448908][ T193] __ia32_sys_socketcall+0x1b/0x30
> [ 38.449593][ T193] __do_fast_syscall_32+0x72/0xd0
> [ 38.450264][ T193] ? syscall_exit_to_user_mode+0x5f/0x90
> [ 38.451001][ T193] ? __do_fast_syscall_32+0x7c/0xd0
> [ 38.451702][ T193] ? __do_fast_syscall_32+0x7c/0xd0
> [ 38.452388][ T193] ? lockdep_hardirqs_on_prepare+0x242/0x400
> [ 38.453184][ T193] ? syscall_exit_to_user_mode+0x5f/0x90
> [ 38.453922][ T193] ? __do_fast_syscall_32+0x7c/0xd0
> [ 38.454609][ T193] ? lockdep_hardirqs_on_prepare+0x242/0x400
> [ 38.455390][ T193] ? syscall_exit_to_user_mode+0x5f/0x90
> [ 38.460224][ T193] ? __do_fast_syscall_32+0x7c/0xd0
> [ 38.460922][ T193] ? irqentry_exit_to_user_mode+0x23/0x30
> [ 38.461675][ T193] do_fast_syscall_32+0x32/0x70
> [ 38.462327][ T193] do_SYSENTER_32+0x15/0x20
> [ 38.462936][ T193] entry_SYSENTER_32+0xa2/0xfb
> [ 38.463567][ T193] EIP: 0xb7ed4549
> [ 38.464077][ T193] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> [ 38.466578][ T193] EAX: ffffffda EBX: 00000010 ECX: bfc1aa40 EDX: 00000000
> [ 38.467499][ T193] ESI: b7e0f000 EDI: 00000000 EBP: 00534540 ESP: bfc1aa30
> [ 38.468437][ T193] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00200282
> [ 38.469442][ T193] Modules linked in:
> [ 38.470183][ T193] ---[ end trace 0000000000000000 ]---
> [ 38.470910][ T193] EIP: folio_flags+0x31/0x70
> [ 38.471704][ T193] Code: 48 83 05 e8 89 bd c5 01 ba a4 c0 2b c4 55 89 e5 83 15 ec 89 bd c5 00 e8 8d 3f f6 ff 83 05 f8 89 bd c5 01 83 15 fc 89 bd c5 00 <0f> 0b 83 05 00 8a bd c5 01 b8 38 1e a6 c4 83 15 04 8a bd c5 00 e8
> [ 38.474274][ T193] EAX: 00000000 EBX: e6c3e400 ECX: 00000000 EDX: ee7ead00
> [ 38.475233][ T193] ESI: 00000000 EDI: 000c2cc0 EBP: c995fb30 ESP: c995fb30
> [ 38.476218][ T193] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210246
> [ 38.477246][ T193] CR0: 80050033 CR2: b5c4f018 CR3: 09cab000 CR4: 000006d0
> [ 38.478206][ T193] Kernel panic - not syncing: Fatal exception
> [ 38.479153][ T193] Kernel Offset: disabled