RE: On disabling AGP without working alternative (PCI and PCIe are also affected)

From: Deucher, Alexander
Date: Thu May 13 2021 - 17:18:23 EST


[AMD Public Use]

> -----Original Message-----
> From: Thomas “illwieckz“ Debesse <dev@xxxxxxxxxxxxx>
> Sent: Thursday, May 13, 2021 4:46 PM
> To: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian
> <Christian.Koenig@xxxxxxx>; LKML <linux-kernel@xxxxxxxxxxxxxxx>
> Subject: Re: On disabling AGP without working alternative (PCI and PCIe are
> also affected)
>
> Le 13/05/2021 à 21:02, Deucher, Alexander a écrit :
> > [AMD Public Use]
> >
> > I don't think I have a functional AGP system anymore, but I do have PCIe
> capable systems and they work fine.
> > Does this patch[1], help by any chance? The change to add support for
> > root ports with addressing limitations seemed to break a lot of old systems,
> but never really got resolved. If not, your best bet is probably to try and
> bisect if something broke your system(s).
> >
> > Alex
> >
> > [1] -
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.
> > spinics.net%2Flists%2Famd-
> gfx%2Fmsg52961.html&amp;data=04%7C01%7CAlexa
> >
> nder.Deucher%40amd.com%7Cec9ae4ac2229473707a708d916502bf7%7C3dd
> 8961fe4
> >
> 884e608e11a82d994e183d%7C0%7C0%7C637565356504234517%7CUnknown
> %7CTWFpbG
> >
> Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> Mn0%
> >
> 3D%7C3000&amp;sdata=PcvPmesLXI26VqPct8hdPxQzxC%2BqY4wFtbTYvwjw
> 6eM%3D&a
> > mp;reserved=0
>
> The more modern PCIe systems seems to not be affected when running
> PCIe cards. I would not be surprised if modern PCIe hosts rely on features
> that were supported in the past and then, the old features are not really
> tested.
>
> For example while reading the Linux code in October I noticed the code was
> referencing different mask lenght, what if only the implementation for the
> newer length works or something like that?
>
> But well, the patch you linked is touching the exact code that made me
> wondering about it:
>
> ```
> dma_bits = 40;
> if (rdev->flags & RADEON_IS_AGP)
> dma_bits = 32;
> if ((rdev->flags & RADEON_IS_PCI) &&
> (rdev->family <= CHIP_RS740))
> dma_bits = 32;
> ```
>
> If I'm right this code sets this value to 40 by default, then sets it to
> 32 if GPU is AGP or if GPU is PCI and identifier is smaller or equal to RS740.
>
> I see no RADEON_IS_PCIE so I assume both PCIe and AGP cards running with
> radeon.agpmode -1 with identifiers greater than RS740 are probably keeping
> this value as 40.
>
> It's interesting to notice the PCI HD 4350 (RV710) will use 40 bits, given it is
> after RS740 in drivers/gpu/drm/radeon/radeon_family.h
>
> If an AGP card is running with radeon.agpmode = -1, how is it reported,
> RADEON_IS_AGP or RADEON_IS_PCI?

It depends on the asic. See radeon_agp_disable(), but it doesn't really matter. The driver doesn't really care, it's all PCI at the end of the day.
The only thing the driver really cares about is whether it will be using the AGP remapper in the chipset for accessing system memory, or whether it will be
using its own built in remapper on the GPU itself.

>
> If RADEON_IS_PCI, the AGP Radeon HD4670 (RV730) will use 40 bits, given it
> is after RS740 in drivers/gpu/drm/radeon/radeon_family.h
>
> I had some memories of having tried to force everything to 32 in that part of
> the code, but then, I still got problems but different ones.

The bits here refer to the addressing capabilities of the device. How many address bits can they handle for DMA. It's baked into the hardware. Device drivers
report the address limits of the device to the kernel so that DMA API will give them memory within the range of addresses they can access.

>
> From
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.
> org%2Flkml%2F2020%2F11%2F9%2F1054&amp;data=04%7C01%7CAlexander.
> Deucher%40amd.com%7Cec9ae4ac2229473707a708d916502bf7%7C3dd8961f
> e4884e608e11a82d994e183d%7C0%7C0%7C637565356504234517%7CUnknow
> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=BUZ8dBLzzltv0N2RTSwdz%2BlT5
> cQgopYgdropej2FINE%3D&amp;reserved=0:
>
> > ## drm/radeon: make all PCI GPUs use 32 bits DMA bit mask
> >
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml
> >
> .org%2Flkml%2F2020%2F11%2F5%2F307&amp;data=04%7C01%7CAlexander.
> Deucher
> >
> %40amd.com%7Cec9ae4ac2229473707a708d916502bf7%7C3dd8961fe4884e6
> 08e11a8
> >
> 2d994e183d%7C0%7C0%7C637565356504234517%7CUnknown%7CTWFpbGZs
> b3d8eyJWIj
> >
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
> 000&am
> >
> p;sdata=1ihVkGgLeFq9IXWzXMMxQHGFhllG5RvgPQ%2BOkZY6dq8%3D&amp
> ;reserved=
> > 0
> >
> > This one is not enough to fix PCI GPUs but it is enough to prevent to
> > fail r600_ring_test on ATI PCI devices. Note that Nvidia PCI GPUs
> > can't be fixed by this, and this uncovers other bug with AGP GPUs when
> > AGP is disabled at build time. Also, this patch may makes PCI GPUs
> > working on a non-optimal way on platform that accepts them with 40-bit
> > DMA bit mask (like Intel-based computers that already work without any
> > patch).
>
> So I was wondering if there was a similar issue elsewhere in the code.

Note that platforms can also impose limitations on DMA even if a device may be more capable. That is what Christoph's patch attempted to address.
The patch you proposed above more or less a partial revert of the same patch I referenced in my last reply.

Alex

>
> I see the patch at
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww
> w.spinics.net%2Flists%2Famd-
> gfx%2Fmsg52961.html&amp;data=04%7C01%7CAlexander.Deucher%40amd.
> com%7Cec9ae4ac2229473707a708d916502bf7%7C3dd8961fe4884e608e11a82
> d994e183d%7C0%7C0%7C637565356504234517%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000&amp;sdata=PcvPmesLXI26VqPct8hdPxQzxC%2BqY4wFtbTYvwj
> w6eM%3D&amp;reserved=0
> is also setting in that code another variable I haven't touched:
> rdev->need_dma32
>
> I'll try to set both dma_bits to 32 and rdev->need_dma32 unconditionally and
> see if I notice a difference with this or that GPU.
>
> Note that the issue with the PCI HD 4350 (RV710) does not need an AGP host
> to be tested, only an AMD host (reproduced from K8 to Piledriver), but
> unfortunately now the PCI variant of this card seems to be very hard to find
> (I doubt the PCIe one is affected).
>
> Thank you for your answer and you attention!
>
> --
> Thomas “illwieckz” Debesse
> I wish to be personally CC'ed the answers/comments posted to the list in
> response to my posting.