RE: Arm64 crash while reading memory sysfs

From: Qian Cai (QUIC)
Date: Wed May 26 2021 - 08:09:32 EST




> -----Original Message-----
> From: Mike Rapoport <rppt@xxxxxxxxxxxxx>
> Sent: Wednesday, May 26, 2021 2:40 AM
> To: Qian Cai (QUIC) <quic_qiancai@xxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>; David Hildenbrand <david@xxxxxxxxxx>; Catalin Marinas
> <catalin.marinas@xxxxxxx>; Anshuman Khandual <anshuman.khandual@xxxxxxx>; Ard Biesheuvel <ardb@xxxxxxxxxx>; Linux
> Memory Management List <linux-mm@xxxxxxxxx>; Will Deacon <will@xxxxxxxxxx>; Marc Zyngier <maz@xxxxxxxxxx>; Linux Kernel
> Mailing List <linux-kernel@xxxxxxxxxxxxxxx>; Linux ARM <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
> Subject: Re: Arm64 crash while reading memory sysfs
>
> Hi,
>
> On Tue, May 25, 2021 at 03:25:59PM +0000, Qian Cai (QUIC) wrote:
> > Reverting the patchset "arm64: drop pfn_valid_within() and simplify pfn_valid()" [1] from today's linux-next fixed a crash while
> reading files under /sys/devices/system/memory.
>
> Can you please send the beginning of the boot log, up to the
> "Memory: xK/yK available ..."
> line?

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
[ 0.000000] Linux version 5.13.0-rc3-next-20210525+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #27 SMP Tue May 25 19:03:24 UTC 2021
[ 0.000000] efi: EFI v2.70 by American Megatrends
[ 0.000000] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dbed98
[ 0.000000] esrt: Reserving ESRT space from 0x0000009ff1d18298 to 0x0000009ff1d182f8.
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x0000009FF5B40000 000024 (v02 ALASKA)
[ 0.000000] ACPI: XSDT 0x0000009FF5B40028 000094 (v01 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: FACP 0x0000009FF5B400C0 000114 (v06 Ampere eMAG 00000003 INTL 20190509)
[ 0.000000] ACPI: DSDT 0x0000009FF5B401D8 00765A (v05 ALASKA A M I 00000001 INTL 20190509)
[ 0.000000] ACPI: FIDT 0x0000009FF5B47838 00009C (v01 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: DBG2 0x0000009FF5B478D8 000061 (v00 Ampere eMAG 00000000 INTL 20190509)
[ 0.000000] ACPI: GTDT 0x0000009FF5B47940 000108 (v02 Ampere eMAG 00000001 INTL 20190509)
[ 0.000000] ACPI: IORT 0x0000009FF5B47A48 000BCC (v00 Ampere eMAG 00000000 INTL 20190509)
[ 0.000000] ACPI: MCFG 0x0000009FF5B48618 0000AC (v01 Ampere eMAG 00000001 INTL 20190509)
[ 0.000000] ACPI: SSDT 0x0000009FF5B486C8 00002D (v02 Ampere eMAG 00000001 INTL 20190509)
[ 0.000000] ACPI: SPMI 0x0000009FF5B486F8 000041 (v05 ALASKA A M I 00000000 AMI. 00000000)
[ 0.000000] ACPI: APIC 0x0000009FF5B48740 000A68 (v04 Ampere eMAG 00000004 01000013)
[ 0.000000] ACPI: PCCT 0x0000009FF5B491A8 0005D0 (v01 Ampere eMAG 00000003 01000013)
[ 0.000000] ACPI: BERT 0x0000009FF5B49778 000030 (v01 Ampere eMAG 00000003 INTL 20190509)
[ 0.000000] ACPI: HEST 0x0000009FF5B497A8 000328 (v01 Ampere eMAG 00000003 INTL 20190509)
[ 0.000000] ACPI: SPCR 0x0000009FF5B49AD0 000050 (v02 A M I APTIO V 01072009 AMI. 0005000D)
[ 0.000000] ACPI: PPTT 0x0000009FF5B49B20 000CB8 (v01 Ampere eMAG 00000003 01000013)
[ 0.000000] ACPI: SPCR: console: pl011,mmio32,0x12600000,115200
[ 0.000000] NUMA: Failed to initialise from firmware
[ 0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x0000009fffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x9ffefbabc0-0x9ffefbffff]
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000090000000-0x0000009fffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000090000000-0x0000000091ffffff]
[ 0.000000] node 0: [mem 0x0000000092000000-0x00000000928fffff]
[ 0.000000] node 0: [mem 0x0000000092900000-0x00000000fffbffff]
[ 0.000000] node 0: [mem 0x00000000fffc0000-0x00000000ffffffff]
[ 0.000000] node 0: [mem 0x0000000880000000-0x0000000fffffffff]
[ 0.000000] node 0: [mem 0x0000008800000000-0x0000009ff5aeffff]
[ 0.000000] node 0: [mem 0x0000009ff5af0000-0x0000009ff5b2ffff]
[ 0.000000] node 0: [mem 0x0000009ff5b30000-0x0000009ff5baffff]
[ 0.000000] node 0: [mem 0x0000009ff5bb0000-0x0000009ff7deffff]
[ 0.000000] node 0: [mem 0x0000009ff7df0000-0x0000009ff7e5ffff]
[ 0.000000] node 0: [mem 0x0000009ff7e60000-0x0000009ff7ffffff]
[ 0.000000] node 0: [mem 0x0000009ff8000000-0x0000009fffffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000090000000-0x0000009fffffffff]
[ 0.000000] kasan: KernelAddressSanitizer initialized
[ 0.000000] psci: probing for conduit method from ACPI.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: MIGRATE_INFO_TYPE not supported.
[ 0.000000] psci: SMC Calling Convention v65535.65535
[ 0.000000] ACPI: SRAT not present
[ 0.000000] percpu: Embedded 10 pages/cpu s584592 r8192 d62576 u655360
[ 0.000000] pcpu-alloc: s584592 r8192 d62576 u655360 alloc=10*65536
[ 0.000000] pcpu-alloc: [0] 00 [0] 01 [0] 02 [0] 03 [0] 04 [0] 05 [0] 06 [0] 07
[ 0.000000] pcpu-alloc: [0] 08 [0] 09 [0] 10 [0] 11 [0] 12 [0] 13 [0] 14 [0] 15
[ 0.000000] pcpu-alloc: [0] 16 [0] 17 [0] 18 [0] 19 [0] 20 [0] 21 [0] 22 [0] 23
[ 0.000000] pcpu-alloc: [0] 24 [0] 25 [0] 26 [0] 27 [0] 28 [0] 29 [0] 30 [0] 31
[ 0.000000] Detected PIPT I-cache on CPU0
[ 0.000000] CPU features: detected: GIC system register CPU interface
[ 0.000000] CPU features: detected: Spectre-v2
[ 0.000000] CPU features: detected: Spectre-v4
[ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2091012
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210525+ root=/dev/mapper/ubuntu--vg-ubuntu--lv ro cma=1024M iommu.passthrough=1
[ 0.000000] Unknown command line parameters: BOOT_IMAGE=/vmlinuz-5.13.0-rc3-next-20210525+ cma=1024M
[ 0.000000] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 0.000000] Memory: 777216K/133955584K available (17920K kernel code, 118786K rwdata, 4416K rodata, 6080K init, 67276K bss, 17379072K reserved, 0K cma-reserved)

>
> > [1] https://lore.kernel.org/kvmarm/20210511100550.28178-1-rppt@xxxxxxxxxx/
> >
> > [ 247.669668][ T1443] kernel BUG at include/linux/mm.h:1383!
> > [ 247.675987][ T1443] Internal error: Oops - BUG: 0 [#1] SMP
> > [ 247.681472][ T1443] Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb i2c_algo_bit
> nvme mlx5_core i2c_core nvme_core firmware_class
> > [ 247.696894][ T1443] CPU: 15 PID: 1443 Comm: ranbug Not tainted 5.13.0-rc3-next-20210524+ #11
> > [ 247.705326][ T1443] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
> > [ 247.713842][ T1443] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > [ 247.720536][ T1443] pc : test_pages_in_a_zone+0x23c/0x300
> > [ 247.725935][ T1443] lr : test_pages_in_a_zone+0x23c/0x300
> > [ 247.731327][ T1443] sp : ffff800023f8f670
> > [ 247.735327][ T1443] x29: ffff800023f8f670 x28: 000000000000a000 x27: 000000000000a000
> > [ 247.743156][ T1443] x26: ffffffbfffe00000 x25: ffff800011c6f738 x24: dfff800000000000
> > [ 247.750984][ T1443] x23: 0000000000002000 x22: ffff009f7efa29c0 x21: 0000000000000000
> > [ 247.758812][ T1443] x20: ffffffffffffffff x19: 0000000000008000 x18: ffff00084f9d3370
> > [ 247.766640][ T1443] x17: 0000000000000000 x16: 0000000000000007 x15: 0000000000000078
> > [ 247.774467][ T1443] x14: 0000000000000000 x13: ffff800011c6eea4 x12: ffff60136cee0574
> > [ 247.782295][ T1443] x11: 1fffe0136cee0573 x10: ffff60136cee0573 x9 : dfff800000000000
> > [ 247.790123][ T1443] x8 : ffff009b67702b9b x7 : 0000000000000001 x6 : ffff009b67702b98
> > [ 247.797951][ T1443] x5 : 00009fec9311fa8d x4 : ffff009b67702b98 x3 : 1fffe00109f3a529
> > [ 247.805778][ T1443] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000034
> > [ 247.813606][ T1443] Call trace:
> > [ 247.816738][ T1443] test_pages_in_a_zone+0x23c/0x300
> > [ 247.821784][ T1443] valid_zones_show+0x1e0/0x298
> > [ 247.826483][ T1443] dev_attr_show+0x50/0xc8
> > [ 247.830747][ T1443] sysfs_kf_seq_show+0x164/0x368
> > [ 247.835533][ T1443] kernfs_seq_show+0x130/0x198
> > [ 247.840143][ T1443] seq_read_iter+0x344/0xd50
> > [ 247.844581][ T1443] kernfs_fop_read_iter+0x32c/0x4a8
> > [ 247.849625][ T1443] new_sync_read+0x2bc/0x4e8
> > [ 247.854063][ T1443] vfs_read+0x18c/0x340
> > [ 247.858066][ T1443] ksys_read+0xf8/0x1e0
> > [ 247.862068][ T1443] __arm64_sys_read+0x74/0xa8
> > [ 247.866591][ T1443] invoke_syscall.constprop.0+0xdc/0x1d8
> > [ 247.872072][ T1443] do_el0_svc+0xe4/0x298
> > [ 247.876162][ T1443] el0_svc+0x20/0x30
> > [ 247.879906][ T1443] el0_sync_handler+0xb0/0xb8
> > [ 247.884429][ T1443] el0_sync+0x178/0x180
> > [ 247.888435][ T1443] Code: b0005ee1 912b8021 910b0021 97fc57ac (d4210000)
> > [ 247.895217][ T1443] ---[ end trace 4ff9f5cbe7443f54 ]---
> > [ 247.900522][ T1443] Kernel panic - not syncing: Oops - BUG: Fatal exception
> > [ 247.907501][ T1443] SMP: stopping secondary CPUs
> > [ 247.912122][ T1443] Kernel Offset: disabled
> > [ 247.916296][ T1443] CPU features: 0x00000251,20000846
> > [ 247.921340][ T1443] Memory Limit: none
> > [ 247.925100][ T1443] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
> >
>
> --
> Sincerely yours,
> Mike.