Re: [PATCH v4 4/5] kdump: wait for DMA to finish when using CMA

From: David Hildenbrand
Date: Tue Jun 03 2025 - 09:15:15 EST


On 30.05.25 22:29, Jiri Bohac wrote:
When re-using the CMA area for kdump there is a risk of pending DMA into
pinned user pages in the CMA area.

Pages that are pinned long-term are migrated away from CMA, so these are
not a concern. Pages pinned without FOLL_LONGTERM remain in the CMA and may
possibly be the source or destination of a pending DMA transfer.

I'll note that we right now do have an upstream BUG where that is sometimes not the case. I mentioned it previously that such bugs will be a problem :(

https://lkml.kernel.org/r/20250523023709epcms1p236d4f55b79adb9366ec1cf6d5792b06b@epcms1p2


Although there is no clear specification how long a page may be pinned
without FOLL_LONGTERM, pinning without the flag shows an intent of the
caller to only use the memory for short-lived DMA transfers, not a transfer
initiated by a device asynchronously at a random time in the future.

Add a delay of CMA_DMA_TIMEOUT_SEC seconds before starting the kdump
kernel, giving such short-lived DMA transfers time to finish before the CMA
memory is re-used by the kdump kernel.

Set CMA_DMA_TIMEOUT_SEC to 10 seconds - chosen arbitrarily as both
a huge margin for a DMA transfer, yet not increasing the kdump time
too significantly.

Signed-off-by: Jiri Bohac <jbohac@xxxxxxx>

---
Changes since v3:
- renamed CMA_DMA_TIMEOUT_SEC to CMA_DMA_TIMEOUT_MSEC, change delay to 10 seconds
- introduce a cma_dma_timeout_sec initialized to CMA_DMA_TIMEOUT_SEC
to make the timeout trivially tunable if needed in the future

---
include/linux/crash_core.h | 3 +++
kernel/crash_core.c | 17 +++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 44305336314e..805a07042c96 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -56,6 +56,9 @@ static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
/* Alignment required for elf header segment */
#define ELF_CORE_HEADER_ALIGN 4096
+/* Default value for cma_dma_timeout_sec */
+#define CMA_DMA_TIMEOUT_SEC 10
+
extern int crash_exclude_mem_range(struct crash_mem *mem,
unsigned long long mstart,
unsigned long long mend);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 335b8425dd4b..a255c9e2ef29 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -21,6 +21,7 @@
#include <linux/reboot.h>
#include <linux/btf.h>
#include <linux/objtool.h>
+#include <linux/delay.h>
#include <asm/page.h>
#include <asm/sections.h>
@@ -33,6 +34,11 @@
/* Per cpu memory for storing cpu states in case of system crash. */
note_buf_t __percpu *crash_notes;
+/* time to wait for possible DMA to finish before starting the kdump kernel
+ * when a CMA reservation is used
+ */
+unsigned int cma_dma_timeout_sec = CMA_DMA_TIMEOUT_SEC;
+
#ifdef CONFIG_CRASH_DUMP
int kimage_crash_copy_vmcoreinfo(struct kimage *image)
@@ -97,6 +103,17 @@ int kexec_crash_loaded(void)
}
EXPORT_SYMBOL_GPL(kexec_crash_loaded);
+static void crash_cma_clear_pending_dma(void)
+{
+ unsigned int s = cma_dma_timeout_sec;
+
+ if (!crashk_cma_cnt)
+ return;
+
+ while (s--)
+ mdelay(1000);

Any reason we cannot do it in a single mdelay() invocation?

mdelay() already is a loop around udelay on larger values IIUC.

--
Cheers,

David / dhildenb