Re: [RFC 0/2] Stop the abuse of Linux-* _OSI strings

From: Karol Herbst
Date: Fri Aug 19 2022 - 13:44:08 EST


On Fri, Aug 19, 2022 at 6:43 PM Limonciello, Mario
<mario.limonciello@xxxxxxx> wrote:
>
> On 8/19/2022 11:37, Karol Herbst wrote:
> > On Fri, Aug 19, 2022 at 6:00 PM Limonciello, Mario
> > <mario.limonciello@xxxxxxx> wrote:
> >>
> >> On 8/19/2022 10:44, Karol Herbst wrote:
> >>> On Fri, Aug 19, 2022 at 4:25 PM Mario Limonciello
> >>> <mario.limonciello@xxxxxxx> wrote:
> >>>>
> >>>> 3 _OSI strings were introduced in recent years that were intended
> >>>> to workaround very specific problems found on specific systems.
> >>>>
> >>>> The idea was supposed to be that these quirks were only used on
> >>>> those systems, but this proved to be a bad assumption. I've found
> >>>> at least one system in the wild where the vendor using the _OSI
> >>>> string doesn't match the _OSI string and the neither does the use.
> >>>>
> >>>> So this brings a good time to review keeping those strings in the kernel.
> >>>> There are 3 strings that were introduced:
> >>>>
> >>>> Linux-Dell-Video
> >>>> -> Intended for systems with NVIDIA cards that didn't support RTD3
> >>>> Linux-Lenovo-NV-HDMI-Audio
> >>>> -> Intended for powering on NVIDIA HDMI device
> >>>> Linux-HPI-Hybrid-Graphics
> >>>> -> Intended for changing dGPU output
> >>>>
> >>>> AFAIK the first string is no longer relevant as nouveau now supports
> >>>> RTD3. If that's wrong, this can be changed for the series.
> >>>>
> >>>
> >>> Nouveau always supported RTD3, because that's mainly a kernel feature.
> >>> When those were introduced we simply had a bug only hit on a few
> >>> systems. And instead of helping us to debug this, this workaround was
> >>> added :( We were not even asked about this.
> >>
> >> My apologies, I was certainly part of the impetus for this W/A in the
> >> first place while I was at my previous employer. Your comment
> >> re-affirms to me that at least the first patch is correct.
> >>
> >
> > Yeah, no worries. I just hope that people in the future will
> > communicate such things.
> >
> > Anyway, there are a few issues with the runpm stuff left, and looking
> > at what nvidia does in their open driver makes me wonder if we might
> > need a bigger overhaul of runpm. They do apply bridge/host controller
> > specific workarounds and I suspect some of them are related here as
> > the workaround I came up with in nouveau can be seen in 434fdb51513bf.
>
> But this overhaul shouldn't gate removing this _OSI string, or you think
> it should?
>

Hard to tell. If there are affected systems but have those _OSI
strings in place so it's hidden, this would be annoying, but then we
might have more pointers on what's actually broken. Anyway, we don't
need those workarounds and rather a real fix for all those issues. And
I suspect the real fix is to apply specific workarounds for specific
systems.

> >
> > But also having access to documentation/specification from what Nvidia
> > is doing would be quite helpful. We know that on some really new AMD
> > systems we run into new issues and this needs some investigation. I
> > simply don't access to any laptops where this problem can be seen.
> >
>
> Do you mean there are specifically remaining issues on AMD APU + NVIDIA
> dGPU systems? Any public bugs by chance?
>
> Depending on what these are I'm happy to try to help with at least
> access. If we have them maybe we can try to make the right connections
> to get some hardware to you, or at least remotely access it.
>

https://gitlab.freedesktop.org/drm/nouveau/-/issues/108

there might be more though, but this should be a good start.

> >>>
> >>> I am a bit curious about the other two though as I am not even sure
> >>> they are needed at all as we put other work arounds in place. @Lyude
> >>> Paul might know more about these.
> >>>
> >>
> >> If the other two really aren't needed anymore, then yeah we should just
> >> tear all 3 out. If that's the direction we go, I would appreciate some
> >> commit IDs to reference in the commit message for tearing them out so
> >> that if they end up backporting to stable we know how far they should go.
> >>
> >
>