Re: [PATCH 0/1] Fix for riscv vmcore issue

From: Alexandre Ghiti
Date: Wed Jul 16 2025 - 02:58:36 EST


Hi Pnina,

On 7/14/25 14:00, Pnina Feder wrote:
Hi Pnina,
Pnina!

Pnina Feder <pnina.feder@xxxxxxxxxxxx> writes:

We are creating a vmcore using kexec on a Linux 6.15 RISC-V system
and analyzing it with the crash tool on the host. This workflow
used to work on Linux 6.14 but is now broken in 6.15.
Thanks for reporting this!

The issue is caused by a change in the kernel:
In Linux 6.15, certain memblock sections are now marked as Reserved
in /proc/iomem. The kexec tool excludes all Reserved regions when
generating the vmcore, so these sections are missing from the dump.
How are you collecting the /proc/vmcore file? A full set of commands would be helpful.
We’ve defined in our system that when a process crashes, we call panic().
To handle crash recovery, we're using kexec with the following command:
kexec -p /Image --initrd=/rootfs.cpio --append "console=${con} earlycon=${earlycon} no4lvl"

To simulate crash, we trigger it using:
sleep 100 & kill -6 $!

This boots into the crash kernel (kdump), where we then copy the /proc/vmcore file back to the host for analysis.

However, the kernel still uses addresses in these regions—for
example, for IRQ pointers. Since the crash tool needs access to
these memory areas to function correctly, their exclusion breaks the analysis.
Wdym with "IRQ pointers"? Also, what version (sha1) of crash are you using?

We are currently using crash-utility version 9.0.0 (master).
From the crash analysis logs, we observed errors like:

"......
IRQ stack pointer[0] is ffffffd6fbdcc068
crash: read error: kernel virtual address: ffffffd6fbdcc068 type: "IRQ stack pointer"
.....

<read_kdump: addr: ffffffff80edf1cc paddr: 8010df1cc cnt: 4>
<readmem: ffffffd6fbdd6880, KVADDR, "runqueues entry (per_cpu)",
3456, (FOE), 55acf03963e0>
read_kdump: addr: ffffffd6fbdd6880 paddr: 8fbdd6880 cnt: 1920<
crash: read error: kernel virtual address: ffffffd6fbdd6880 type: "runqueues entry (per_cpu)"

I can't reproduce this issue on qemu, booting with sv39. I'm using the latest kexec-tools (which recently merged riscv .support), crash 9.0.0 and kernel 6.16.0-rc4. Note that I'm using crash in qemu.

Are you able to reproduce this on qemu too?
Yes, I am using qemu too on main and crash kernel, with latest kexec-tools, crash 9.0.0 and kernel 6.15


Maybe that's related to the config, can you share your config?
this is my dev_config

CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_AUDIT=y
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_BPF_SYSCALL=y
CONFIG_PREEMPT_RT=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_PSI=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
CONFIG_CGROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
CONFIG_NAMESPACES=y
CONFIG_USER_NS=y
CONFIG_CHECKPOINT_RESTORE=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_EXPERT=y
CONFIG_PROFILING=y
CONFIG_KEXEC=y
CONFIG_ARCH_VIRT=y
CONFIG_NONPORTABLE=y
CONFIG_SMP=y
CONFIG_NR_CPUS=32
CONFIG_HZ_1000=y
CONFIG_CPU_IDLE=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_IOSCHED_BFQ=y
CONFIG_PAGE_REPORTING=y
CONFIG_PERCPU_STATS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_INET_ESP=m
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_FILTER=y
CONFIG_BRIDGE=m
CONFIG_BRIDGE_VLAN_FILTERING=y
CONFIG_VLAN_8021Q=m
CONFIG_NET_SCHED=y
CONFIG_NET_CLS_CGROUP=m
CONFIG_NETLINK_DIAG=y
CONFIG_NET_L3_MASTER_DEV=y
CONFIG_CGROUP_NET_PRIO=y
CONFIG_FAILOVER=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_MTD=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_PHYSMAP_OF=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
CONFIG_VIRTIO_BLK=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_VIRTIO=y
CONFIG_MD=y
CONFIG_BLK_DEV_DM=y
CONFIG_NETDEVICES=y
CONFIG_MACB=y
CONFIG_PCS_XPCS=m
CONFIG_SERIO_LIBPS2=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_LEGACY_PTY_COUNT=16
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_OF_PLATFORM=y
CONFIG_SERIAL_EARLYCON_RISCV_SBI=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_I2C=y
CONFIG_I2C_DESIGNWARE_CORE=y
CONFIG_SPI=y
CONFIG_PINCTRL=y
CONFIG_PINCTRL_SINGLE=y
CONFIG_GPIOLIB=y
CONFIG_GPIO_SYSFS=y
CONFIG_GPIO_DWAPB=y
CONFIG_GPIO_SIFIVE=y
CONFIG_POWER_SUPPLY=y
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_REGULATOR=y
CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_BACKLIGHT_CLASS_DEVICE=m
CONFIG_SCSI_UFSHCD=y
CONFIG_SCSI_UFSHCD_PLATFORM=y
CONFIG_SCSI_UFS_DWC_TC_PLATFORM=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_M41T80=y
CONFIG_DMADEVICES=y
CONFIG_SYNC_FILE=y
CONFIG_COMMON_CLK_EYEQ=y
CONFIG_RPMSG_CHAR=y
CONFIG_RPMSG_CTRL=y
CONFIG_RPMSG_VIRTIO=y
CONFIG_RESET_CONTROLLER=y
CONFIG_RESET_SIMPLE=y
CONFIG_GENERIC_PHY=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_PATH=y
CONFIG_CRYPTO_RSA=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_BLAKE2B=m
CONFIG_CRYPTO_XXHASH=m
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRC_CCITT=m
CONFIG_CRC_ITU_T=y
CONFIG_CRC7=y
CONFIG_LIBCRC32C=m
CONFIG_PRINTK_TIME=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO_DWARF5=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_PTDUMP_DEBUGFS=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_WQ_WATCHDOG=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_PLIST=y
CONFIG_DEBUG_SG=y
CONFIG_RCU_EQS_DEBUG=y
CONFIG_MEMTEST=y

These failures occur consistently for addresses in the 0xffffffd000000000 region.

FYI, this region is the direct mapping (see Documentation/arch/riscv/vm-layout.rst).

Thanks,

Alex

Hi Alex!

Do I have something to try or help to process this issue?
maybe, can you give your Config and I will try it on my system?
Any more information I can share?


So I'm able to reproduce your issue with your config, it only happens with kexec_load(), not kexec_file_load().

Your patch does not fix the problem for me, makedumpfile still fails. I spent quite some time looking for the code that parses the memory regions and exposes them as PT_LOAD segments in vmcore, but I did not find it, do you know where that happens for kexec_load()?

Thanks,

Alex



Thanks a lot,
Pnina

Upon inspection, we confirmed that the physical addresses corresponding to those virtual addresses are not present in the vmcore, as they fall under Reserved memory sections.
We tested a patch to kexec-tools that prevents exclusion of the Reserved-memblock section from the vmcore. With this patch, the issue no longer occurs, and crash analysis succeeds.
Note: I suspect the same issue exists on ARM64, as both the signal.c and kexec-tools implementations are similar.

Thanks!
Björn
_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv