BUG: BISECTED: memleak in vfs_write()

From: Mirsad Goran Todorovac
Date: Fri Jan 27 2023 - 13:37:04 EST


Hi all,

I came across a memory leak with the vanilla mainline Torvalds tree kernel
with MGLRU and CONFIG_KMEMLEAK enabled:

unreferenced object 0xffff8d7c92ad5180 (size 192):
comm "ftracetest", pid 2738512, jiffies 4335176273 (age 4842.976s)
hex dump (first 32 bytes):
c0 59 ad 92 7c 8d ff ff 60 dd d7 31 7c 8d ff ff .Y..|...`..1|...
60 55 df 97 ff ff ff ff 09 00 02 00 00 00 00 00 `U..............
backtrace:
[<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
[<ffffffff96556dda>] kmalloc_trace+0x2a/0xa0
[<ffffffff964382fc>] tracing_log_err+0x16c/0x1b0
[<ffffffff96451963>] append_filter_err+0x113/0x1d0
[<ffffffff96453c0a>] create_event_filter+0xba/0xe0
[<ffffffff96454b18>] set_trigger_filter+0x98/0x160
[<ffffffff96456554>] event_trigger_parse+0x104/0x180
[<ffffffff96455823>] trigger_process_regex+0xc3/0x110
[<ffffffff964558f7>] event_trigger_write+0x77/0xe0
[<ffffffff96623a41>] vfs_write+0xd1/0x420
[<ffffffff9662413b>] ksys_write+0x7b/0x100
[<ffffffff966241e9>] __x64_sys_write+0x19/0x20
[<ffffffff971c9188>] do_syscall_64+0x58/0x80
[<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff8d7b076be000 (size 32):
comm "ftracetest", pid 2738512, jiffies 4335176273 (age 4842.976s)
hex dump (first 32 bytes):
0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 61 0a 00 00 . Command: a...
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
[<ffffffff96557a8d>] __kmalloc+0x4d/0xd0
[<ffffffff96438314>] tracing_log_err+0x184/0x1b0
[<ffffffff96451963>] append_filter_err+0x113/0x1d0
[<ffffffff96453c0a>] create_event_filter+0xba/0xe0
[<ffffffff96454b18>] set_trigger_filter+0x98/0x160
[<ffffffff96456554>] event_trigger_parse+0x104/0x180
[<ffffffff96455823>] trigger_process_regex+0xc3/0x110
[<ffffffff964558f7>] event_trigger_write+0x77/0xe0
[<ffffffff96623a41>] vfs_write+0xd1/0x420
[<ffffffff9662413b>] ksys_write+0x7b/0x100
[<ffffffff966241e9>] __x64_sys_write+0x19/0x20
[<ffffffff971c9188>] do_syscall_64+0x58/0x80
[<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff8d7c92ad59c0 (size 192):
comm "ftracetest", pid 2738512, jiffies 4335176280 (age 4843.088s)
hex dump (first 32 bytes):
c0 5c ad 92 7c 8d ff ff 80 51 ad 92 7c 8d ff ff .\..|....Q..|...
60 55 df 97 ff ff ff ff 01 00 0b 00 00 00 00 00 `U..............
backtrace:
[<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
[<ffffffff96556dda>] kmalloc_trace+0x2a/0xa0
[<ffffffff964382fc>] tracing_log_err+0x16c/0x1b0
[<ffffffff96451963>] append_filter_err+0x113/0x1d0
[<ffffffff96453c0a>] create_event_filter+0xba/0xe0
[<ffffffff96454b18>] set_trigger_filter+0x98/0x160
[<ffffffff96456554>] event_trigger_parse+0x104/0x180
[<ffffffff96455823>] trigger_process_regex+0xc3/0x110
[<ffffffff964558f7>] event_trigger_write+0x77/0xe0
[<ffffffff96623a41>] vfs_write+0xd1/0x420
[<ffffffff9662413b>] ksys_write+0x7b/0x100
[<ffffffff966241e9>] __x64_sys_write+0x19/0x20
[<ffffffff971c9188>] do_syscall_64+0x58/0x80
[<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

The bug was noticed on Lenovo desktop 10TX000VCR (LENOVO_MT_10TX_BU_Lenovo_FM_V530S-07ICB)
running AlmaLinux 8.7 (Stone Smilodon), a CentOS clone, with the compiler:

mtodorov@domac:~/linux/kernel/linux_torvalds$ gcc --version
gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
mtodorov@domac:~/linux/kernel/linux_torvalds$

Bisecting gave the following culprit commit:

git bisect good a92ce570c81dc0feaeb12a429b4bc65686d17967
# good: [c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a] ipmi/watchdog: use strscpy() to instead of strncpy()
git bisect good c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a
# good: [90b12f423d3c8a89424c7bdde18e1923dfd0941e] Merge tag 'for-linus-6.2-1' of https://github.com/cminyard/linux-ipmi
git bisect good 90b12f423d3c8a89424c7bdde18e1923dfd0941e
# first bad commit: [71946a25f357a51dcce849367501d7fb04c0465b] Merge tag 'mmc-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

The commit was merged on December 13th 2022.

It is a huge commit.

The selftests/ftrace/ftracetest triggers this leak, sometimes several times in a run.
ftracetest requires root permission to run, but I haven't yet realised whether a non-superuser
could devise an automated script to abuse this leak exhausting all kernel's memory.

Non-root user gets a EPERM error when trying to access /proc/sys/kernel internals:

[marvin@pc-mtodorov linux_torvalds]$ tools/testing/selftests/ftrace/ftracetest
Error: this must be run by root user
tools/testing/selftests/ftrace/ftracetest: line 46: /proc/sys/kernel/sched_rt_runtime_us: Permission denied
[marvin@pc-mtodorov linux_torvalds]$

Hope this helps.

According to the Code of Conduct, I have Cc:-ed maintainers from get_maintainers.pl and
I will add Thorsten because this is sort of a regression :-)

Regards,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

Attachment: config-6.1.0-rc1-mglru-kmemlk-00007-g6dbd4341b9da.xz
Description: application/xz