Re: [PATCH v16 20/20] drm/panfrost: Switch to generic memory shrinker

From: Steven Price
Date: Wed Sep 06 2023 - 06:55:41 EST


On 05/09/2023 09:08, Boris Brezillon wrote:
> On Mon, 4 Sep 2023 14:20:24 +0100
> Steven Price <steven.price@xxxxxxx> wrote:
>
>> On 03/09/2023 18:07, Dmitry Osipenko wrote:
>>> Replace Panfrost's custom memory shrinker with a common drm-shmem
>>> memory shrinker.
>>>
>>> Tested-by: Steven Price <steven.price@xxxxxxx> # Firefly-RK3288
>>
>> I just gave this version of the series a spin and I can trigger the following
>> warning:
>>
>> [ 477.776163] ------------[ cut here ]------------
>> [ 477.781353] WARNING: CPU: 0 PID: 292 at drivers/gpu/drm/drm_gem_shmem_helper.c:227 drm_gem_shmem_free+0x1fc/0x200 [drm_shmem_helper]
>> [ 477.794790] panfrost ffa30000.gpu: drm_WARN_ON(refcount_read(&shmem->pages_use_count))
>> [ 477.794797] Modules linked in: panfrost gpu_sched drm_shmem_helper
>> [ 477.810942] CPU: 0 PID: 292 Comm: glmark2-es2-drm Not tainted 6.5.0-rc2-00527-gc8a0c16fa830 #1
>> [ 477.820564] Hardware name: Rockchip (Device Tree)
>> [ 477.825820] unwind_backtrace from show_stack+0x10/0x14
>> [ 477.831670] show_stack from dump_stack_lvl+0x58/0x70
>> [ 477.837319] dump_stack_lvl from __warn+0x7c/0x1a4
>> [ 477.842680] __warn from warn_slowpath_fmt+0x134/0x1a0
>> [ 477.848429] warn_slowpath_fmt from drm_gem_shmem_free+0x1fc/0x200 [drm_shmem_helper]
>> [ 477.857199] drm_gem_shmem_free [drm_shmem_helper] from drm_gem_handle_delete+0x84/0xb0
>> [ 477.866163] drm_gem_handle_delete from drm_ioctl+0x214/0x4ec
>> [ 477.872592] drm_ioctl from sys_ioctl+0x568/0xd48
>> [ 477.877857] sys_ioctl from ret_fast_syscall+0x0/0x1c
>> [ 477.883504] Exception stack(0xf0a49fa8 to 0xf0a49ff0)
>> [ 477.889148] 9fa0: 005969c0 bef34880 00000006 40086409 bef34880 00000001
>> [ 477.898289] 9fc0: 005969c0 bef34880 40086409 00000036 bef34880 00590b64 00590aec 00000000
>> [ 477.907428] 9fe0: b6ec408c bef3485c b6ead42f b6c31f98
>> [ 477.913188] irq event stamp: 37296889
>> [ 477.917319] hardirqs last enabled at (37296951): [<c03c1968>] __up_console_sem+0x50/0x60
>> [ 477.926531] hardirqs last disabled at (37296972): [<c03c1954>] __up_console_sem+0x3c/0x60
>> [ 477.935714] softirqs last enabled at (37296986): [<c03016cc>] __do_softirq+0x318/0x4d4
>> [ 477.944708] softirqs last disabled at (37296981): [<c034f9ec>] __irq_exit_rcu+0x140/0x160
>> [ 477.953878] ---[ end trace 0000000000000000 ]---
>>
>> So something, somewhere has gone wrong with the reference counts.
>
> Missing `got_pages_sgt = true;` in the fault handler, when creating the
> sgt and populating the first 2MB chunk, I guess (should have been part
> of "drm/shmem-helper: Use flag for tracking page count bumped by
> get_pages_sgt()"). This kinda proves my point though: adding flags
> for things that can be inferred from other fields is a bad idea, because
> there's always the risk of not updating all the places that are manually
> filling these other fields...

Yes that seems to fix the problem. And I agree derived fields like this
are often problematic - it's better to avoid them whenever possible.

Steve