Re: [syzbot] [mm?] WARNING in folio_large_mapcount
From: Shivank Garg
Date: Tue May 20 2025 - 01:46:54 EST
On 5/19/2025 6:56 PM, David Hildenbrand wrote:
> On 17.05.25 10:21, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 627277ba7c23 Merge tag 'arm64_cbpf_mitigation_2025_05_08' ..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1150f670580000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=5929ac65be9baf3c
>> dashboard link: https://syzkaller.appspot.com/bug?extid=2b99589e33edbe9475ca
>> compiler: Debian clang version 20.1.2 (++20250402124445+58df0ef89dd6-1~exp1~20250402004600.97), Debian LLD 20.1.2
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/0a42ae72fe0e/disk-627277ba.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/0be88297bb66/vmlinux-627277ba.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/31808a4b1210/bzImage-627277ba.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+2b99589e33edbe9475ca@xxxxxxxxxxxxxxxxxxxxxxxxx
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 38 at ./include/linux/mm.h:1335 folio_large_mapcount+0xd0/0x110 include/linux/mm.h:1335
>
> This should be
>
> VM_WARN_ON_FOLIO(!folio_test_large(folio), folio);
>
>> Modules linked in:
>> CPU: 1 UID: 0 PID: 38 Comm: khugepaged Not tainted 6.15.0-rc6-syzkaller-00025-g627277ba7c23 #0 PREEMPT(full)
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
>> RIP: 0010:folio_large_mapcount+0xd0/0x110 include/linux/mm.h:1335
>> Code: 04 38 84 c0 75 29 8b 03 ff c0 5b 41 5e 41 5f e9 96 d2 2b 09 cc e8 d0 cb 99 ff 48 89 df 48 c7 c6 20 de 77 8b e8 a1 dc de ff 90 <0f> 0b 90 eb b6 89 d9 80 e1 07 80 c1 03 38 c1 7c cb 48 89 df e8 87
>> RSP: 0018:ffffc90000af77e0 EFLAGS: 00010246
>> RAX: e1fcb38c0ff8ce00 RBX: ffffea00014c8000 RCX: e1fcb38c0ff8ce00
>> RDX: 0000000000000001 RSI: ffffffff8d9226df RDI: ffff88801e2fbc00
>> RBP: ffffc90000af7b50 R08: ffff8880b8923e93 R09: 1ffff110171247d2
>> R10: dffffc0000000000 R11: ffffed10171247d3 R12: 1ffffd4000299000
>> R13: dffffc0000000000 R14: 0000000000000000 R15: dffffc0000000000
>> FS: 0000000000000000(0000) GS:ffff8881261fb000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007ffe58f12dc0 CR3: 0000000030e04000 CR4: 00000000003526f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> <TASK>
>> folio_mapcount include/linux/mm.h:1369 [inline]
>
> And here we come through
>
> if (likely(!folio_test_large(folio))) {
> ...
> }
> return folio_large_mapcount(folio);
>
>
> So the folio is split concurrently. And I think there is nothing stopping it from getting freed.
>
> We do a xas_for_each() under RCU. So yes, this is racy.
>
> In collapse_file(), we re-validate everything.
>
> We could
>
> (A) Take proper pagecache locks
>
> (B) Try grabbing a temporary folio reference
>
> (C) Try snapshotting the folio
>
> Probably, in this code, (B) might be cleanest for now? Handling it just like other code in mm/filemap.c.
>
Hi,
I've implemented your suggestion (B) using folio_try_get().
Could you please review if my patch looks correct?
Tested it using existing selftests: sudo make -C tools/testing/selftests/mm run_tests
Other two instances of is_refcount_suitable() uses folio locking. Should we maintain
consistency with those?
Thanks,
Shivank
#syz testFrom d1c3427e80215fea992428c8b5caf5291725dd65 Mon Sep 17 00:00:00 2001
From: Shivank Garg <shivankg@xxxxxxx>
Date: Mon, 19 May 2025 20:19:32 +0000
Subject: [PATCH] mm/khugepaged: Fix race with folio splitting in
hpage_collapse_scan_file()
folio_mapcount() checks folio_test_large() before proceeding to
folio_large_mapcount(), but there exists a race window where a folio
could be split between these checks which triggered the
VM_WARN_ON_FOLIO(!folio_test_large(folio), folio) in
folio_large_mapcount().
Take a temporary folio reference in hpage_collapse_scan_file() to prevent
races with concurrent folio splitting/freeing. This prevent potential
incorrect large folio detection.
Reported-by: syzbot+2b99589e33edbe9475ca@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@xxxxxxxxxx
Suggested-by: David Hildenbrand <david@xxxxxxxxxx>
Signed-off-by: Shivank Garg <shivankg@xxxxxxx>
---
mm/khugepaged.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cc945c6ab3bd..ef4f95409723 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2295,6 +2295,19 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
continue;
}
+ /* Take a reference to prevent any concurrent split or free. */
+ if (!folio_try_get(folio)) {
+ xas_reset(&xas);
+ continue;
+ }
+
+ /* Has the folio been freed or split? */
+ if (unlikely(folio != xas_reload(&xas))) {
+ folio_put(folio);
+ xas_reset(&xas);
+ continue;
+ }
+
if (folio_order(folio) == HPAGE_PMD_ORDER &&
folio->index == start) {
/* Maybe PMD-mapped */
@@ -2305,23 +2318,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
* it's safe to skip LRU and refcount checks before
* returning.
*/
+ folio_put(folio);
break;
}
node = folio_nid(folio);
if (hpage_collapse_scan_abort(node, cc)) {
result = SCAN_SCAN_ABORT;
+ folio_put(folio);
break;
}
cc->node_load[node]++;
if (!folio_test_lru(folio)) {
result = SCAN_PAGE_LRU;
+ folio_put(folio);
break;
}
if (!is_refcount_suitable(folio)) {
result = SCAN_PAGE_COUNT;
+ folio_put(folio);
break;
}
@@ -2333,6 +2350,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
*/
present += folio_nr_pages(folio);
+ folio_put(folio);
if (need_resched()) {
xas_pause(&xas);
--
2.34.1