Hi Alex!Yes, I am using qemu too on main and crash kernel, with latest kexec-tools, crash 9.0.0 and kernel 6.15Hi Pnina,
Pnina!We’ve defined in our system that when a process crashes, we call panic().
Pnina Feder <pnina.feder@xxxxxxxxxxxx> writes:
We are creating a vmcore using kexec on a Linux 6.15 RISC-V systemThanks for reporting this!
and analyzing it with the crash tool on the host. This workflow
used to work on Linux 6.14 but is now broken in 6.15.
The issue is caused by a change in the kernel:
In Linux 6.15, certain memblock sections are now marked as Reserved
in /proc/iomem. The kexec tool excludes all Reserved regions when
generating the vmcore, so these sections are missing from the dump.
How are you collecting the /proc/vmcore file? A full set of commands would be helpful.
To handle crash recovery, we're using kexec with the following command:
kexec -p /Image --initrd=/rootfs.cpio --append "console=${con} earlycon=${earlycon} no4lvl"
To simulate crash, we trigger it using:
sleep 100 & kill -6 $!
This boots into the crash kernel (kdump), where we then copy the /proc/vmcore file back to the host for analysis.
"......However, the kernel still uses addresses in these regions—forWdym with "IRQ pointers"? Also, what version (sha1) of crash are you using?
example, for IRQ pointers. Since the crash tool needs access to
these memory areas to function correctly, their exclusion breaks the analysis.
We are currently using crash-utility version 9.0.0 (master).
From the crash analysis logs, we observed errors like:
IRQ stack pointer[0] is ffffffd6fbdcc068
crash: read error: kernel virtual address: ffffffd6fbdcc068 type: "IRQ stack pointer".....
<read_kdump: addr: ffffffff80edf1cc paddr: 8010df1cc cnt: 4><readmem: ffffffd6fbdd6880, KVADDR, "runqueues entry (per_cpu)",
3456, (FOE), 55acf03963e0>
read_kdump: addr: ffffffd6fbdd6880 paddr: 8fbdd6880 cnt: 1920<crash: read error: kernel virtual address: ffffffd6fbdd6880 type: "runqueues entry (per_cpu)"
I can't reproduce this issue on qemu, booting with sv39. I'm using the latest kexec-tools (which recently merged riscv .support), crash 9.0.0 and kernel 6.16.0-rc4. Note that I'm using crash in qemu.
Are you able to reproduce this on qemu too?
Maybe that's related to the config, can you share your config?this is my dev_config
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_AUDIT=y
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_BPF_SYSCALL=y
CONFIG_PREEMPT_RT=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_PSI=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
CONFIG_CGROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
CONFIG_NAMESPACES=y
CONFIG_USER_NS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_EXPERT=y
CONFIG_PROFILING=y
CONFIG_KEXEC=y
CONFIG_ARCH_VIRT=y
CONFIG_NONPORTABLE=y
CONFIG_SMP=y
CONFIG_NR_CPUS=32
CONFIG_HZ_1000=y
CONFIG_CPU_IDLE=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_IOSCHED_BFQ=y
CONFIG_PAGE_REPORTING=y
CONFIG_PERCPU_STATS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_INET_ESP=m
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_FILTER=y
CONFIG_BRIDGE=m
CONFIG_BRIDGE_VLAN_FILTERING=y
CONFIG_VLAN_8021Q=m
CONFIG_NET_SCHED=y
CONFIG_NET_CLS_CGROUP=m
CONFIG_NETLINK_DIAG=y
CONFIG_NET_L3_MASTER_DEV=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_FAILOVER=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_MTD=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_PHYSMAP_OF=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
CONFIG_VIRTIO_BLK=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_VIRTIO=y
CONFIG_MD=y
CONFIG_BLK_DEV_DM=y
CONFIG_NETDEVICES=y
CONFIG_MACB=y
CONFIG_PCS_XPCS=m
CONFIG_SERIO_LIBPS2=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_LEGACY_PTY_COUNT=16
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_OF_PLATFORM=y
CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_I2C=y
CONFIG_I2C_DESIGNWARE_CORE=y
CONFIG_SPI=y
CONFIG_PINCTRL=y
CONFIG_PINCTRL_SINGLE=y
CONFIG_GPIOLIB=y
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_DWAPB=y
CONFIG_GPIO_SIFIVE=y
CONFIG_POWER_SUPPLY=y
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_REGULATOR=y
CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_BACKLIGHT_CLASS_DEVICE=m
CONFIG_SCSI_UFSHCD=y
CONFIG_SCSI_UFSHCD_PLATFORM=y
CONFIG_SCSI_UFS_DWC_TC_PLATFORM=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_M41T80=y
CONFIG_DMADEVICES=y
CONFIG_SYNC_FILE=y
CONFIG_COMMON_CLK_EYEQ=y
CONFIG_RPMSG_CHAR=y
CONFIG_RPMSG_CTRL=y
CONFIG_RPMSG_VIRTIO=y
CONFIG_RESET_CONTROLLER=y
CONFIG_RESET_SIMPLE=y
CONFIG_GENERIC_PHY=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_PATH=y
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_BLAKE2B=m
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRC_CCITT=m
CONFIG_CRC_ITU_T=y
CONFIG_CRC7=y
CONFIG_LIBCRC32C=m
CONFIG_PRINTK_TIME=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO_DWARF5=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_PTDUMP_DEBUGFS=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_WQ_WATCHDOG=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_PLIST=y
CONFIG_DEBUG_SG=y
CONFIG_RCU_EQS_DEBUG=y
CONFIG_MEMTEST=y
These failures occur consistently for addresses in the 0xffffffd000000000 region.
FYI, this region is the direct mapping (see Documentation/arch/riscv/vm-layout.rst).
Thanks,
Alex
Do I have something to try or help to process this issue?
maybe, can you give your Config and I will try it on my system?
Any more information I can share?
Thanks a lot,
Pnina
_______________________________________________Upon inspection, we confirmed that the physical addresses corresponding to those virtual addresses are not present in the vmcore, as they fall under Reserved memory sections.
We tested a patch to kexec-tools that prevents exclusion of the Reserved-memblock section from the vmcore. With this patch, the issue no longer occurs, and crash analysis succeeds.
Note: I suspect the same issue exists on ARM64, as both the signal.c and kexec-tools implementations are similar.
Thanks!
Björn
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv