Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )

From: Gabriel C
Date: Wed Jun 06 2018 - 11:45:37 EST


2018-06-06 17:03 GMT+02:00 Michel DÃnzer <michel@xxxxxxxxxxx>:
> On 2018-06-06 04:44 PM, Christian KÃnig wrote:
>> Am 06.06.2018 um 16:12 schrieb Michel DÃnzer:
>>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>> 2018-06-06 14:19 GMT+02:00 Christian KÃnig <christian.koenig@xxxxxxx>:
>>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>> 2018-06-06 13:33 GMT+02:00 Christian KÃnig <christian.koenig@xxxxxxx>:
>>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>>
>>>>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>>
>>>>>>
>>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>>
>>>>>
>>>>> That has nothing TODO with the driver nor the original bug you
>>>>> reported. The
>>>>> problem is that SME is active and that is currently not supported at
>>>>> all
>>>>> with a that hardware.
>>>>
>>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>>> release ?
>>>>
>>>> SME was like this in kernel 4.16.x here and all worked.
>>>
>>> If that is true, again please bisect which commit broke it.
>>>
>>> All the reports I've seen before this indicated that at least amdgpu
>>> has never worked with SME (which BTW doesn't mean it's never going to
>>> work or that we don't want to support it, just that as far as we know
>>> it's currently not working).
>>
>> At least in theory it should work when we use the coherent DMA allocator.
>>
>> When that really worked before, so the most likely commit which broke
>> this is:
>>
>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>> Author: Chunming Zhou <david1.zhou@xxxxxxx>
>> Date: Fri Feb 9 10:44:09 2018 +0800
>>
>> drm/amdgpu: only enable swiotlb alloc when need v2
>>
>> get the max io mapping address of system memory to see if it is over
>> our card accessing range.
>> v2: move checking later
>>
>> Signed-off-by: Chunming Zhou <david1.zhou@xxxxxxx>
>> Reviewed-by: Monk Liu <monk.liu@xxxxxxx>
>> Reviewed-by: Christian KÃnig <christian.koenig@xxxxxxx>
>> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
>>
>> Currently looking into how we could somehow improve this detection.
>
> I guess this could fit for Gabriel, but e.g.
> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
> earlier).

I got strange performance issue with 4.15 and 4.16 .. but SME was ON
on that setup ( even before it hit mainline ) and never broke the GPU like this.

There is a 4.16.13 boot dmesg which has no such issue:

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt

With the setup as is booting 4.16.x works , while 4.17 trows the errors.

>
>
> --
> Earthling Michel DÃnzer | http://www.amd.com
> Libre software enthusiast | Mesa and X developer