Re: [syzbot] [mm?] WARNING in __folio_rmap_sanity_checks

From: Ryan Roberts
Date: Fri Jan 05 2024 - 03:15:08 EST


On 05/01/2024 02:20, Yin Fengwei wrote:
>
>
> On 2024/1/5 05:36, David Hildenbrand wrote:
>> On 03.01.24 15:16, Yin, Fengwei wrote:
>>>
>>>
>>> On 1/3/2024 8:13 PM, David Hildenbrand wrote:
>>>> On 03.01.24 12:48, syzbot wrote:
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit:    ab0b3e6ef50d Add linux-next specific files for 20240102
>>>>> git tree:       linux-next
>>>>> console+strace: https://syzkaller.appspot.com/x/log.txt?x=17be3e09e80000
>>>>> kernel config:
>>>>> https://syzkaller.appspot.com/x/.config?x=a14a6350374945f9
>>>>> dashboard link:
>>>>> https://syzkaller.appspot.com/bug?extid=50ef73537bbc393a25bb
>>>>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils
>>>>> for Debian) 2.40
>>>>> syz repro:
>>>>> https://syzkaller.appspot.com/x/repro.syz?x=14e2256ee80000
>>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17b57db5e80000
>>>>>
>>>>> Downloadable assets:
>>>>> disk image:
>>>>> https://storage.googleapis.com/syzbot-assets/4e6376fe5764/disk-ab0b3e6e.raw.xz
>>>>> vmlinux:
>>>>> https://storage.googleapis.com/syzbot-assets/7cb9ecbaf001/vmlinux-ab0b3e6e.xz
>>>>> kernel image:
>>>>> https://storage.googleapis.com/syzbot-assets/2c1a9a6d424f/bzImage-ab0b3e6e.xz
>>>>>
>>>>> The issue was bisected to:
>>>>>
>>>>> commit 68f0320824fa59c5429cbc811e6c46e7a30ea32c
>>>>> Author: David Hildenbrand <david@xxxxxxxxxx>
>>>>> Date:   Wed Dec 20 22:44:31 2023 +0000
>>>>>
>>>>>       mm/rmap: convert folio_add_file_rmap_range() into
>>>>> folio_add_file_rmap_[pte|ptes|pmd]()
>>>>>
>>>>> bisection log:
>>>>> https://syzkaller.appspot.com/x/bisect.txt?x=10b9e1b1e80000
>>>>> final oops:
>>>>> https://syzkaller.appspot.com/x/report.txt?x=12b9e1b1e80000
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=14b9e1b1e80000
>>>>>
>>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>>> commit:
>>>>> Reported-by: syzbot+50ef73537bbc393a25bb@xxxxxxxxxxxxxxxxxxxxxxxxx
>>>>> Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range()
>>>>> into folio_add_file_rmap_[pte|ptes|pmd]()")
>>>>>
>>>>>    kasan_quarantine_reduce+0x18e/0x1d0 mm/kasan/quarantine.c:283
>>>>>    __kasan_slab_alloc+0x65/0x90 mm/kasan/common.c:324
>>>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>>    slab_post_alloc_hook mm/slub.c:3813 [inline]
>>>>>    slab_alloc_node mm/slub.c:3860 [inline]
>>>>>    kmem_cache_alloc+0x136/0x320 mm/slub.c:3867
>>>>>    vm_area_alloc+0x1f/0x220 kernel/fork.c:465
>>>>>    mmap_region+0x3ae/0x2a90 mm/mmap.c:2804
>>>>>    do_mmap+0x890/0xef0 mm/mmap.c:1379
>>>>>    vm_mmap_pgoff+0x1a7/0x3c0 mm/util.c:573
>>>>>    ksys_mmap_pgoff+0x421/0x5a0 mm/mmap.c:1425
>>>>>    __do_sys_mmap arch/x86/kernel/sys_x86_64.c:93 [inline]
>>>>>    __se_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline]
>>>>>    __x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:86
>>>>>    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>>>>    do_syscall_64+0xd0/0x250 arch/x86/entry/common.c:83
>>>>>    entry_SYSCALL_64_after_hwframe+0x62/0x6a
>>>>> ------------[ cut here ]------------
>>>>> WARNING: CPU: 1 PID: 5059 at include/linux/rmap.h:202
>>>>> __folio_rmap_sanity_checks+0x4d5/0x630 include/linux/rmap.h:202
>>>>> Modules linked in:
>>>>> CPU: 1 PID: 5059 Comm: syz-executor115 Not tainted
>>>>> 6.7.0-rc8-next-20240102-syzkaller #0
>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>> BIOS Google 11/17/2023
>>>>> RIP: 0010:__folio_rmap_sanity_checks+0x4d5/0x630 include/linux/rmap.h:202
>>>>> Code: 41 83 e4 01 44 89 e6 e8 79 bc b7 ff 45 84 e4 0f 85 08 fc ff ff
>>>>> e8 3b c1 b7 ff 48 c7 c6 e0 b5 d9 8a 48 89 df e8 5c 12 f7 ff 90 <0f> 0b
>>>>> 90 e9 eb fb ff ff e8 1e c1 b7 ff be 01 00 00 00 48 89 df e8
>>>>> RSP: 0018:ffffc900038df978 EFLAGS: 00010293
>>>>> RAX: 0000000000000000 RBX: ffffea00008cde00 RCX: ffffffff81687419
>>>>> RDX: ffff88807becbb80 RSI: ffffffff81d06104 RDI: 0000000000000000
>>>>> RBP: ffffea00008cde00 R08: 0000000000000000 R09: fffffbfff1e75f6a
>>>>> R10: ffffffff8f3afb57 R11: 0000000000000001 R12: 0000000000000000
>>>>> R13: 0000000000000001 R14: 0000000000000000 R15: dffffc0000000000
>>>>> FS:  0000555556508380(0000) GS:ffff8880b9900000(0000)
>>>>> knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 00000000200000c0 CR3: 0000000079000000 CR4: 00000000003506f0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Call Trace:
>>>>>    <TASK>
>>>>>    __folio_add_rmap mm/rmap.c:1167 [inline]
>>>>>    __folio_add_file_rmap mm/rmap.c:1452 [inline]
>>>>>    folio_add_file_rmap_ptes+0x8e/0x2c0 mm/rmap.c:1478
>>>>>    insert_page_into_pte_locked.isra.0+0x34d/0x960 mm/memory.c:1874
>>>>>    insert_page mm/memory.c:1900 [inline]
>>>>>    vm_insert_page+0x62c/0x8c0 mm/memory.c:2053
>>>>>    packet_mmap+0x314/0x570 net/packet/af_packet.c:4594
>>>>>    call_mmap include/linux/fs.h:2090 [inline]
>>>>>    mmap_region+0x745/0x2a90 mm/mmap.c:2819
>>>>>    do_mmap+0x890/0xef0 mm/mmap.c:1379
>>>>>    vm_mmap_pgoff+0x1a7/0x3c0 mm/util.c:573
>>>>>    ksys_mmap_pgoff+0x421/0x5a0 mm/mmap.c:1425
>>>>>    __do_sys_mmap arch/x86/kernel/sys_x86_64.c:93 [inline]
>>>>>    __se_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline]
>>>>>    __x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:86
>>>>>    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>>>>    do_syscall_64+0xd0/0x250 arch/x86/entry/common.c:83
>>>>>    entry_SYSCALL_64_after_hwframe+0x62/0x6a
>>>>
>>>> If I am not wrong, that triggers:
>>>>
>>>> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>>>>            !folio_test_large_rmappable(folio), folio);
>>>>
>>>> So we are trying to rmap a large folio that did not go through
>>>> folio_prep_large_rmappable().

Would someone mind explaining the rules to me for this? As far as I can see,
folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we
remember to deinit the list on destruction. Why can't we just init that list for
all folios order-2 or greater? Then everything is rmappable?

>>>>
>>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed
>>>> in the "struct packet_ring_buffer". No idea where that comes from, but I
>>>> suspect it's simply some compound allocation.
>>> Looks like:
>>>    alloc_pg_vec
>>>      alloc_one_pg_vec_page
>>>           gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
>>>                             __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
>>>
>>>           buffer = (char *) __get_free_pages(gfp_flags, order);
>>> So you are right here... :).
>>
>> Hm, but I wonder if this something that's supposed to work or is this one of
>> the cases where we should actually use a VM_PFN mapping?
>>
>> It's not a pagecache(file/shmem) page after all.
>>
>> We could relax that check and document why we expect something that is not
>> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP
>> instead (like recent udmabuf changes).
>
> VM_PFNMAP looks correct.

And why is making the folio rmappable and mapping it the normal way not the
right solution here? Because the folio could be order-1? Or something more profound?

>
> I do have another question: why do we just check the large folio
> rmappable? Does that mean order0 folio is always rmappable?
>
> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.
>
> There is not warning because we didn't check order0 folio rmappable.
>
>
> Regards
> Yin, Fengwei
>