Re: [linus:master] [mm/mmap] 0503ea8f5b: kernel_BUG_at_mm/filemap.c

From: Liam R. Howlett
Date: Sun Feb 26 2023 - 22:18:25 EST


* kernel test robot <oliver.sang@xxxxxxxxx> [230226 20:33]:
>
> Greeting,
>
> FYI, we noticed kernel_BUG_at_mm/filemap.c due to commit (built with gcc-11):
>
> commit: 0503ea8f5ba73eb3ab13a81c1eefbaf51405385a ("mm/mmap: remove __vma_adjust()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linux-next/master 0222aa9800b25ff171d6dcabcabcd5c42c6ffc3f]

If this is linux-next, shouldn't that mailing list be in the Cc list?

>
> in testcase: trinity
> version: trinity-static-i386-x86_64-1c734c75-1_2020-01-06
> with following parameters:
>
> runtime: 300s
> group: group-04
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> please be noted, as below table, parent also has other type issues, and we
> found they happen almost at same position of kernel_BUG_at_mm/filemap.c for
> this commit if looking into dmesg (attached two parent dmesgs as well)

I don't understand what you are saying with the above paragraph.

I thought I understood that the bug happens in the previous commits and
there is a dmesg from the previous two parents attached.. but when I
look at the attached two dmesg, they are from the same commit and
neither has anything to do with filemap.

And why would we blame the later commit for the same bug?

Did something go wrong with the bisection?

>
> we don't have knowledge if this commit fixes some problem in parent then
> run further until further issues, but since this commit touches
> mm/filemap.c, we just made out this report FYI

I changed one line in a comment in mm/filemap.c in the commit.

-------------
diff --git a/mm/filemap.c b/mm/filemap.c
index c915ded191f0..992554c18f1f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -97,7 +97,7 @@
* ->i_pages lock (__sync_single_inode)
*
* ->i_mmap_rwsem
- * ->anon_vma.lock (vma_adjust)
+ * ->anon_vma.lock (vma_merge)
*

-----------

Are you sure about this bisection? I'm not saying it isn't my fault or
looking to blame others, but I suspect we are indeed looking at the
wrong commit here.

>
> BTW, we also noticed there is a fix commit
> 07dc4b1862035 (" mm/mremap: fix dup_anon_vma() in vma_merge() case 4")
> by further testing, BUG_at_mm/filemap.c is still existing there.
>
> +---------------------------------------------+------------+------------+
> | | 287051b185 | 0503ea8f5b |
> +---------------------------------------------+------------+------------+
> | BUG:kernel_NULL_pointer_dereference,address | 11 | |
> | Oops:#[##] | 11 | |
> | RIP:dup_anon_vma | 11 | |
> | Kernel_panic-not_syncing:Fatal_exception | 20 | 9 |
> | canonical_address#:#[##] | 9 | |
> | RIP:anon_vma_clone | 9 | |
> | kernel_BUG_at_mm/filemap.c | 0 | 9 |
> | invalid_opcode:#[##] | 0 | 9 |
> | RIP:filemap_unaccount_folio | 0 | 9 |
> +---------------------------------------------+------------+------------+
>

At what commit did problems start showing up regardless of what the
problem was? I did not see any other emails from this bot since
2023-02-12, but clearly it is reporting problems with earlier commits
considering the table above.

>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> | Link: https://lore.kernel.org/oe-lkp/202302252122.38b2139-oliver.sang@xxxxxxxxx
>
>
> [ 28.065728][ T4983] ------------[ cut here ]------------
> [ 28.066480][ T4983] kernel BUG at mm/filemap.c:153!
> [ 28.067153][ T4983] invalid opcode: 0000 [#1] SMP PTI
> [ 28.067868][ T4983] CPU: 0 PID: 4983 Comm: trinity-c3 Not tainted 6.2.0-rc4-00443-g0503ea8f5ba7 #1
> [ 28.069001][ T4983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
> [ 28.072145][ T4983] RIP: 0010:filemap_unaccount_folio (filemap.c:?)
> [ 28.072927][ T4983] Code: 89 fb 0f ba e0 10 72 05 8b 46 30 eb 0a 8b 46 58 85 c0 7f 07 8b 46 54 85 c0 78 11 48 c7 c6 a0 aa 24 82 48 89 ef e8 0b d2 02 00 <0f> 0b 48 89 ef e8 01 e7 ff ff be 13 00 00 00 48 89 ef 41 89 c4 41
...

> [ 28.087701][ T4983] __filemap_remove_folio (??:?)
> [ 28.088418][ T4983] ? unmap_mapping_range_tree (memory.c:?)
> [ 28.089168][ T4983] ? mapping_can_writeback+0x5/0xc
> [ 28.089940][ T4983] filemap_remove_folio (??:?)
> [ 28.090627][ T4983] truncate_inode_folio (??:?)
> [ 28.091342][ T4983] shmem_undo_range (shmem.c:?)
> [ 28.092036][ T4983] shmem_truncate_range (??:?)
> [ 28.092753][ T4983] shmem_fallocate (shmem.c:?)
> [ 28.093444][ T4983] vfs_fallocate (??:?)
> [ 28.094128][ T4983] madvise_vma_behavior (madvise.c:?)
> [ 28.094874][ T4983] do_madvise (??:?)
> [ 28.095491][ T4983] __ia32_sys_madvise (??:?)
> [ 28.096166][ T4983] do_int80_syscall_32 (??:?)
> [ 28.096885][ T4983] entry_INT80_compat (??:?)

What happened to your line numbers? Didn't these show up before? They
did on 2023-02-06 [1]

...
>
> To reproduce:
>
> # build kernel
> cd linux
> cp config-6.2.0-rc4-00443-g0503ea8f5ba7 .config
> make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
> make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> cd <mod-install-dir>
> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>

This does not work for me. Since my last use of lkp it seems something
was changed and now -watchdog is not recognized by my qemu and so my
attempts to reproduce this are failing. Is there a way to avoid using
the -watchdog flag? Running the command by hand fails as it seems some
files are removed on exit?

I did try to remove the directories and run from a clean state, but it
still fails for me. (see below)


Thanks,
Liam

1. https://lore.kernel.org/linux-mm/202302062208.24d3e563-oliver.sang@xxxxxxxxx/

Log of failed lkp 68d76160fd7bb767c4a63e7709706b462c475e1b
======================
x86_64
==> Making package: lkp-src 0-1 (Sun Feb 26 09:31:45 PM EST 2023)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> WARNING: Using existing $srcdir/ tree
==> Removing existing $pkgdir/ directory...
==> Starting build()...
make: Entering directory '/home/jedix/lkp-tests/bin/event'
klcc -D_FORTIFY_SOURCE=2 -c -o wakeup.o wakeup.c
klcc -Wl,-O1,--sort-common,--as-needed,-z,relro -static -o wakeup wakeup.o
rm -f wakeup.o
strip wakeup
make: Leaving directory '/home/jedix/lkp-tests/bin/event'
==> Entering fakeroot environment...
x86_64
==> Starting package()...
==> Creating package "lkp-src"...
88466 blocks
renamed '/home/jedix/.lkp/cache/lkp-x86_64.cgz.tmp' -> '/home/jedix/.lkp/cache/lkp-x86_64.cgz'
==> Leaving fakeroot environment.
==> Finished making: lkp-src 0-1 (Sun Feb 26 09:31:47 PM EST 2023)
~/lkp-tests
11 blocks
result_root: /home/jedix/.lkp//result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/0
downloading initrds ...
use local modules: /home/jedix/.lkp/cache/modules.cgz
/usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 https://download.01.org/0day-ci/lkp-qemu/osimage/yocto/yocto-i386-minimal-20190520.cgz -N -P /home/jedix/.lkp/cache/osimage/yocto
17916 blocks
/usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 https://download.01.org/0day-ci/lkp-qemu/osimage/pkg/debian-x86_64-20180403.cgz/trinity-static-i386-x86_64-1c734c75-1_2020-01-06.cgz -N -P /home/jedix/.lkp/cache/osimage/pkg/debian-x86_64-20180403.cgz
43019 blocks
exec command: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -fsdev local,id=test_dev,path=/home/jedix/.lkp//result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/0,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=9p/virtfs_mount -kernel bzImage -append root=/dev/ram0 RESULT_ROOT=/result/trinity/group-04-300s/vm-snb/yocto-i386-minimal-20190520.cgz/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/19 BOOT_IMAGE=/pkg/linux/x86_64-kexec/gcc-11/0503ea8f5ba73eb3ab13a81c1eefbaf51405385a/vmlinuz-6.2.0-rc4-00443-g0503ea8f5ba7 branch=linus/master job=/lkp/jobs/scheduled/vm-meta-102/trinity-group-04-300s-yocto-i386-minimal-20190520.cgz-0503ea8f5ba73eb3ab13a81c1eefbaf51405385a-20230224-7240-hzx70n-16.yaml user=lkp ARCH=x86_64 kconfig=x86_64-kexec commit=0503ea8f5ba73eb3ab13a81c1eefbaf51405385a initcall_debug nmi_watchdog=0 vmalloc=256M initramfs_async=0 page_owner=on max_uptime=1200 LKP_LOCAL_RUN=1 selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw ip=dhcp result_service=9p/virtfs_mount -initrd /home/jedix/.lkp/cache/final_initrd -smp 2 -m 3419M -no-reboot -watchdog i6300esb -rtc base=localtime -device e1000,netdev=net0 -netdev user,id=net0 -display none -monitor null -serial stdio
qemu-system-x86_64: -watchdog: invalid option