[PATCH v1 1/3] drm/buddy: add a flag to disable trimming of non cleared blocks
From: Pierre-Eric Pelloux-Prayer
Date: Wed Jul 02 2025 - 12:14:17 EST
A vkcts test case is triggering a case where the drm buddy allocator
wastes lots of memory and performs badly:
dEQP-VK.memory.allocation.basic.size_8KiB.reverse.count_4000
For each memory pool type, the test will allocate 4000 8kB objects,
and then will release them. The alignment request is 256kB.
For each object, the allocator will select a 256kB block (to
match the alignment), and then trim it to 8kB, adding lots of free
entries to the free_lists of order 5 to 1.
On deallocation, none of these objects will be merged with their
buddy because their "clear status" is different: only the block
that was handed over to the driver might come back cleared.
Also since the test don't allocate much memory, the allocator don't
need to force the merge process so it will repeat the same logic
for each run.
As a result, after the first run (which takes about 6sec), the
freelists look like this:
chunk_size: 4KiB, total: 16368MiB, free: 15354MiB, clear_free: 397MiB
[...]
order- 5 free: 1914 MiB, blocks: 15315
order- 4 free: 957 MiB, blocks: 15325
order- 3 free: 480 MiB, blocks: 15360
order- 2 free: 239 MiB, blocks: 15347
order- 1 free: 238 MiB, blocks: 30489
After the second run (19 sec):
chunk_size: 4KiB, total: 16368MiB, free: 15374MiB, clear_free: 537MiB
[...]
order- 5 free: 3326 MiB, blocks: 26615
order- 4 free: 1663 MiB, blocks: 26619
order- 3 free: 833 MiB, blocks: 26659
order- 2 free: 416 MiB, blocks: 26643
order- 1 free: 414 MiB, blocks: 53071
list_insert_sorted is part of the problem here since it iterates
over the free_list to figure out where to insert the new blocks.
To fix this while keeping the clear tracking information, a new
bit is exposed to drivers, allowing them to disable trimming for
blocks that aren't "clear". This bit is used by amdgpu because
it always returns cleared memory to drm_buddy.
With this bit set, the "merge buddies on deallocation logic" can
work again, and the free_list are not growing indefinitely anymore.
So after a run we get:
chunk_size: 4KiB, total: 16368MiB, free: 15306MiB, clear_free: 1734MiB
[...]
order- 5 free: 2 MiB, blocks: 17
order- 4 free: 2 MiB, blocks: 35
order- 3 free: 1 MiB, blocks: 41
order- 2 free: 656 KiB, blocks: 41
order- 1 free: 256 KiB, blocks: 32
The runtime is better (2 sec) and stable across multiple runs, and we
also see that the reported "clear_free" amount is larger than without
the patch.
Fixes: 96950929eb23 ("drm/buddy: Implement tracking clear page feature")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 8 ++++++++
drivers/gpu/drm/drm_buddy.c | 1 +
include/drm/drm_buddy.h | 1 +
3 files changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index abdc52b0895a..dbbaa15a973e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -499,6 +499,14 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
INIT_LIST_HEAD(&vres->blocks);
+ /* Trimming create smaller blocks that may never be given to the driver.
+ * Such blocks won't be cleared until being seen by the driver, which might
+ * never occur (for instance UMD might request large alignment) => in such
+ * case, upon release of the block, the drm_buddy allocator won't merge them
+ * back, because their clear status is different.
+ */
+ vres->flags = DRM_BUDDY_TRIM_IF_CLEAR;
+
if (place->flags & TTM_PL_FLAG_TOPDOWN)
vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index a1e652b7631d..555c72abce4c 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -1092,6 +1092,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
/* Trim the allocated block to the required size */
if (!(flags & DRM_BUDDY_TRIM_DISABLE) &&
+ (!(flags & DRM_BUDDY_TRIM_IF_CLEAR) || drm_buddy_block_is_clear(block)) &&
original_size != size) {
struct list_head *trim_list;
LIST_HEAD(temp);
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
index 9689a7c5dd36..c338d03028c3 100644
--- a/include/drm/drm_buddy.h
+++ b/include/drm/drm_buddy.h
@@ -28,6 +28,7 @@
#define DRM_BUDDY_CLEAR_ALLOCATION BIT(3)
#define DRM_BUDDY_CLEARED BIT(4)
#define DRM_BUDDY_TRIM_DISABLE BIT(5)
+#define DRM_BUDDY_TRIM_IF_CLEAR BIT(6)
struct drm_buddy_block {
#define DRM_BUDDY_HEADER_OFFSET GENMASK_ULL(63, 12)
--
2.43.0