Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

From: Thierry Reding
Date: Thu Apr 08 2021 - 09:26:15 EST


On Thu, Apr 08, 2021 at 02:42:42AM -0700, Nicolin Chen wrote:
> On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote:
> > All consumer-grade Android and Chromebook devices show a splash screen
> > on boot and then display is left enabled when kernel is booted. This
> > behaviour is unacceptable in a case of implicit IOMMU domains to which
> > devices are attached during kernel boot since devices, like display
> > controller, may perform DMA at that time. We can work around this problem
> > by deferring the enable of SMMU translation for a specific devices,
> > like a display controller, until the first IOMMU mapping is created,
> > which works good enough in practice because by that time h/w is already
> > stopped.
> >
> > Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx>
>
> For both patches:
> Acked-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx>
> Tested-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx>
>
> The WAR looks good to me. Perhaps Thierry would give some input.
>
> Another topic:
> I think this may help work around the mc-errors, which we have
> been facing on Tegra210 also when we enable IOMMU_DOMAIN_DMA.
> (attached a test patch rebasing on these two)

Ugh... that's exactly what I was afraid of. Now everybody is going to
think that we can just work around this issue with driver-specific SMMU
hacks...

> However, GPU would also report errors using DMA domain:
>
> nouveau 57000000.gpu: acr: firmware unavailable
> nouveau 57000000.gpu: pmu: firmware unavailable
> nouveau 57000000.gpu: gr: firmware unavailable
> tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffbe200: Security violation (TrustZone violation)
> nouveau 57000000.gpu: DRM: failed to create kernel channel, -22
> tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffad000: Security violation (TrustZone violation)
> nouveau 57000000.gpu: fifo: SCHED_ERROR 20 []
> nouveau 57000000.gpu: fifo: SCHED_ERROR 20 []
>
> Looking at the address, seems that GPU allocated memory in 32-bit
> physical address space behind SMMU, so a violation happened after
> turning on DMA domain I guess...

The problem with GPU is... extra complicated. You're getting these
faults because you're enabling the IOMMU-backed DMA API, which then
causes the Nouveau driver allocate buffers using the DMA API instead of
explicitly allocating pages and then mapping them using the IOMMU API.
However, there are additional patches needed to teach Nouveau about how
to deal with SMMU and those haven't been merged yet. I've got prototypes
of this, but before the whole framebuffer carveout passing work makes
progress there's little sense in moving individual pieces forward.

One more not to try and cut corners. We know what the right solution is,
even if it takes a lot of work. I'm willing to ack this patch, or some
version of it, but only as a way of working around things we have no
realistic chance of fixing properly anymore. I still think it would be
best if we could derive identity mappings from command-line arguments on
these platforms because I think most of them will actually set that, and
then the solution becomes at least uniform at the SMMU level.

For Tegra210 I've already laid out a path to a solution that's going to
be generic and extend to Tegra186 and later as well.

Thierry

Attachment: signature.asc
Description: PGP signature