Re: framebuffer corruption due to overlapping stp instructions on arm64

From: Matt Sealey
Date: Thu Aug 02 2018 - 16:49:31 EST


The easiest explanation for this would be that the memory isnât mapped correctly. You canât use PCIe memory spaces with anything other than Device-nGnRE or stricter mappings. Thatâs just differences between the AMBA and PCIe (posted/unposted) memory models.

Normal memory (cacheable or uncacheable, which Linux tends to call âmemoryâ and âwritecombineâ respectively) is not a good idea.

There are two options; make sure Links maps itâs framebuffer as Device memory, or the driver, or both - and make sure that only aligned accesses happen (otherwise youâll just get a synchronous exception) and there isnât a Normal memory alias.

Alternatively, tell the PCIe driver that the framebuffer is in system memory - you can map it however you like but thereâll be a performance hit if you start to use GPU acceleration, but a significant performance boost from the PoV of the CPU. Only memory accessed from the PCIe master interface (i.e. reads and writes generated by the card itself - telling the GPU to pull from system memory or other DMA) can be in Normal memory and this allows PCIe to be cache coherent with the right interconnect. The slave port on a PCIe root complex (i.e. CPU writes) canât be used with Normal, or reorderable, and therefore your 2GB of graphics memory is going to be slow from the point of view of the CPU.

To find the correct mapping youâll need to know just how cache coherent the PCIe RC is...

Ta,
Matt

On Thu, Aug 2, 2018 at 14:31 Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
Hi

I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a
strange problem.

When I use the links browser in graphics mode on the framebuffer, I get
occasional pixel corruption. Links does memcpy, memset and 4-byte writes
on the framebuffer - nothing else.

I found out that the pixel corruption is caused by overlapping unaligned
stp instructions inside memcpy. In order to avoid branching, the arm64
memcpy implementation may write the same destination twice with different
alignment. If I put "dmb sy" between the overlapping stp instructions, the
pixel corruption goes away.

This seems like a hardware bug. Is it a known errata? Do you have any
workarounds for it?

I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same
corruption. OpenGL doesn't work (it results in artifacts on the AMD card
and lock-up on the NVidia card), but it's quite expected if even simple
writing to the framebuffer doesn't work.

Mikulas