Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM

From: Thorsten Leemhuis
Date: Sun Mar 06 2022 - 10:54:19 EST


Hi, this is your Linux kernel regression tracker again. Top-posting once
more, to make this easily accessible to everyone.

What's the status of this? It looks stuck, or did the discussion
continue somewhere else? James, it sounded like you wanted to test
something, did you give it a try? Or is there some reason why I should
stop tracking this regression?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

#regzbot poke

On 16.02.22 17:37, Alex Deucher wrote:
> On Tue, Feb 15, 2022 at 9:35 PM James D. Turner
> <linuxkernel.foss@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Hi Alex,
>>
>>> I guess just querying the ATIF method does something that negatively
>>> influences the windows driver in the guest. Perhaps the platform
>>> thinks the driver has been loaded since the method has been called so
>>> it enables certain behaviors that require ATIF interaction that never
>>> happen because the ACPI methods are not available in the guest.
>>
>> Do you mean the `amdgpu_atif_pci_probe_handle` function? If it would be
>> helpful, I could try disabling that function and testing again.
>
> Correct.
>
>>
>>> I don't really have a good workaround other than blacklisting the
>>> driver since on bare metal the driver needs to use this interface for
>>> platform interactions.
>>
>> I'm not familiar with ATIF, but should `amdgpu_atif_pci_probe_handle`
>> really be called for PCI devices which are bound to vfio-pci? I'd expect
>> amdgpu to ignore such devices.
>>
>> As I understand it, starting with
>> f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)"),
>> the `amdgpu_acpi_detect` function loops over all PCI devices in the
>> `PCI_CLASS_DISPLAY_VGA` and `PCI_CLASS_DISPLAY_OTHER` classes to find
>> the ATIF and ATCS handles. Maybe skipping over any PCI devices bound to
>> vfio-pci would fix the issue? On a related note, shouldn't it also skip
>> over any PCI devices with non-AMD vendor IDs?
>
> The ACPI methods are global. There's only one instance of each per
> system and they are relevant to add GPUs on the platform. That's why
> they are a global resource in the driver. They can be hung off of the
> dGPU or APU ACPI namespace, depending on the platform which is why we
> check all of the display devices. Skipping them would prevent them
> from being available if you later bound the amdgpu driver to the GPU
> device(s) I think.
>
> Alex
>
>>
>> Regards,
>> James
>
>