Re: [PATCH 2/2] drm: Revert syncobj timeline changes.

From: Eric Anholt
Date: Fri Nov 09 2018 - 17:27:04 EST


Eric Anholt <eric@xxxxxxxxxx> writes:

> [ Unknown signature status ]
> zhoucm1 <zhoucm1@xxxxxxx> writes:
>
>> On 2018å11æ09æ 00:52, Christian KÃnig wrote:
>>> Am 08.11.18 um 17:07 schrieb Koenig, Christian:
>>>> Am 08.11.18 um 17:04 schrieb Eric Anholt:
>>>>> Daniel suggested I submit this, since we're still seeing regressions
>>>>> from it. This is a revert to before 48197bc564c7 ("drm: add syncobj
>>>>> timeline support v9") and its followon fixes.
>>>> This is a harmless false positive from lockdep, Chouming and I are
>>>> already working on a fix.
>>>
>>> On the other hand we had enough trouble with that patch, so if it
>>> really bothers you feel free to add my Acked-by: Christian KÃnig
>>> <christian.koenig@xxxxxxx> and push it.
>> NAK, please no, I don't think this needed, the Warning totally isn't
>> related to syncobj timeline, but fence-array implementation flaw, just
>> exposed by syncobj.
>> In addition, Christian already has a fix for this Warning, I've tested.
>> Please Christian send to public review.
>
> I backed out my revert of #2 (#1 still necessary) after adding the
> lockdep regression fix, and now my CTS run got oomkilled after just a
> few hours, with these notable lines in the unreclaimable slab info list:
>
> [ 6314.373099] drm_sched_fence 69095KB 69095KB
> [ 6314.373653] kmemleak_object 428249KB 428384KB
> [ 6314.373736] kmalloc-262144 256KB 256KB
> [ 6314.373743] kmalloc-131072 128KB 128KB
> [ 6314.373750] kmalloc-65536 64KB 64KB
> [ 6314.373756] kmalloc-32768 1472KB 1728KB
> [ 6314.373763] kmalloc-16384 64KB 64KB
> [ 6314.373770] kmalloc-8192 208KB 208KB
> [ 6314.373778] kmalloc-4096 2408KB 2408KB
> [ 6314.373784] kmalloc-2048 288KB 336KB
> [ 6314.373792] kmalloc-1024 1457KB 1512KB
> [ 6314.373800] kmalloc-512 854KB 1048KB
> [ 6314.373808] kmalloc-256 188KB 268KB
> [ 6314.373817] kmalloc-192 69141KB 69142KB
> [ 6314.373824] kmalloc-64 47703KB 47704KB
> [ 6314.373886] kmalloc-128 46396KB 46396KB
> [ 6314.373894] kmem_cache 31KB 35KB
>
> No results from kmemleak, though.

OK, it looks like the #2 revert probably isn't related to the OOM issue.
Running a single job on otherwise unused DRM, watching /proc/slabinfo
every second for drm_sched_fence, I get:

drm_sched_fence 0 0 192 21 1 : tunables 32 16 8 : slabdata 0 0 0 : globalstat 0 0 0 0 0 0 0 0 0 : cpustat 0 0 0 0
drm_sched_fence 16 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 13 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 6 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 4 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 2 21 192 21 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0
drm_sched_fence 0 21 192 21 1 : tunables 32 16 8 : slabdata 0 1 0 : globalstat 16 16 1 0 0 0 0 0 0 : cpustat 5 1 6 0

So we generate a ton of fences, and I guess free them slowly because of
RCU? And presumably kmemleak was sucking up lots of memory because of
how many of these objects were laying around.

Attachment: signature.asc
Description: PGP signature