Re: [PATCH] drm/ttm: Use __GFP_NOWARN for huge pages in ttm_pool_alloc_page

From: David Rientjes
Date: Sat Jan 30 2021 - 20:09:26 EST


On Thu, 28 Jan 2021, Michel Dänzer wrote:

> From: Michel Dänzer <mdaenzer@xxxxxxxxxx>
>
> Without __GFP_NOWARN, attempts at allocating huge pages can trigger
> dmesg splats like below (which are essentially noise, since TTM falls
> back to normal pages if it can't get a huge one).
>
> [ 9556.710241] clinfo: page allocation failure: order:9, mode:0x194dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_ZERO|__GFP_NOMEMALLOC), nodemask=(null),cpuset=user.slice,mems_allowed=0
> [ 9556.710259] CPU: 1 PID: 470821 Comm: clinfo Tainted: G E 5.10.10+ #4
> [ 9556.710264] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.OR 11/29/2019
> [ 9556.710268] Call Trace:
> [ 9556.710281] dump_stack+0x6b/0x83
> [ 9556.710288] warn_alloc.cold+0x7b/0xdf
> [ 9556.710297] ? __alloc_pages_direct_compact+0x137/0x150
> [ 9556.710303] __alloc_pages_slowpath.constprop.0+0xc1b/0xc50
> [ 9556.710312] __alloc_pages_nodemask+0x2ec/0x320
> [ 9556.710325] ttm_pool_alloc+0x2e4/0x5e0 [ttm]
> [ 9556.710332] ? kvmalloc_node+0x46/0x80
> [ 9556.710341] ttm_tt_populate+0x37/0xe0 [ttm]
> [ 9556.710350] ttm_bo_handle_move_mem+0x142/0x180 [ttm]
> [ 9556.710359] ttm_bo_validate+0x11d/0x190 [ttm]
> [ 9556.710391] ? drm_vma_offset_add+0x2f/0x60 [drm]
> [ 9556.710399] ttm_bo_init_reserved+0x2a7/0x320 [ttm]
> [ 9556.710529] amdgpu_bo_do_create+0x1b8/0x500 [amdgpu]
> [ 9556.710657] ? amdgpu_bo_subtract_pin_size+0x60/0x60 [amdgpu]
> [ 9556.710663] ? get_page_from_freelist+0x11f9/0x1450
> [ 9556.710789] amdgpu_bo_create+0x40/0x270 [amdgpu]
> [ 9556.710797] ? _raw_spin_unlock+0x16/0x30
> [ 9556.710927] amdgpu_gem_create_ioctl+0x123/0x310 [amdgpu]
> [ 9556.711062] ? amdgpu_gem_force_release+0x150/0x150 [amdgpu]
> [ 9556.711098] drm_ioctl_kernel+0xaa/0xf0 [drm]
> [ 9556.711133] drm_ioctl+0x20f/0x3a0 [drm]
> [ 9556.711267] ? amdgpu_gem_force_release+0x150/0x150 [amdgpu]
> [ 9556.711276] ? preempt_count_sub+0x9b/0xd0
> [ 9556.711404] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> [ 9556.711411] __x64_sys_ioctl+0x83/0xb0
> [ 9556.711417] do_syscall_64+0x33/0x80
> [ 9556.711421] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: bf9eee249ac2 ("drm/ttm: stop using GFP_TRANSHUGE_LIGHT")
> Signed-off-by: Michel Dänzer <mdaenzer@xxxxxxxxxx>

Acked-by: David Rientjes <rientjes@xxxxxxxxxx>

Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx> reported the same issue.