Re: [PATCH 00/12] drm/nouveau: support for GK20A, cont'd

From: Alexandre Courbot
Date: Wed Mar 26 2014 - 02:34:18 EST


Hi Lucas,

On Mon, Mar 24, 2014 at 10:19 PM, Lucas Stach <l.stach@xxxxxxxxxxxxxx> wrote:
> Hi Alexandre,
>
> Am Montag, den 24.03.2014, 17:42 +0900 schrieb Alexandre Courbot:
>> Hi everyone,
> [...]
>>
>> A few lines of hacks (not included here) are still needed to deal with cached
>> mappings triggering external aborts and CPU/GPU memory coherency issues, but I
>> hope to understand and address these issues next.
>
> For the coherency issue part you may want to look at my Nouveau on ARM
> series. Most of it never made it upstream, as I lacked the time to work
> further on this, but it solves the coherency issue from the kernel.

Oh, thanks for pointing this out, it will probably be most useful.
Shall I assume the patches at
https://www.mail-archive.com/nouveau@xxxxxxxxxxxxxxxxxxxxx/msg13557.html
are up-to-date? Would you mind if I include the relevant patches of
yours in the next iteration of this series?

>
> It does so by doing the necessary manual cache flushes/invalidates on
> buffer access, so costs some performance. To avoid this you really want
> to get writecombined mappings into the kernel<->userspace interface.
> Simply mapping the pushbuf as WC/US has brought a 7% performance
> increase in OpenArena when I last tested this. This test was done with
> only one PCIe lane, so the perf increase may be even better with a more
> adequate interconnect.

Interestingly if I allow writecombined mappings in the kernel I get
faults when attempting the read the mapped area:

[ 78.074854] Unhandled fault: external abort on non-linefetch
(0x1008) at 0xf003e010
...
[ 78.337862] [<c03491a8>] (nouveau_bo_rd32) from [<c0346374>]
(nouveau_fence_update+0x5c/0x80)
[ 78.352536] [<c0346374>] (nouveau_fence_update) from [<c03463b0>]
(nouveau_fence_done+0x18/0x28)
[ 78.367531] [<c03463b0>] (nouveau_fence_done) from [<c02b852c>]
(ttm_bo_wait+0x104/0x184)
[ 78.381915] [<c02b852c>] (ttm_bo_wait) from [<c034c718>]
(nouveau_gem_ioctl_cpu_prep+0x40/0xe8)
[ 78.396849] [<c034c718>] (nouveau_gem_ioctl_cpu_prep) from
[<c029fd5c>] (drm_ioctl+0x404/0x4b8)
[ 78.411790] [<c029fd5c>] (drm_ioctl) from [<c0343960>]
(nouveau_drm_ioctl+0x54/0x80)
[ 78.425805] [<c0343960>] (nouveau_drm_ioctl) from [<c00ea5ec>]
(do_vfs_ioctl+0x3f0/0x5bc)
[ 78.440277] [<c00ea5ec>] (do_vfs_ioctl) from [<c00ea7ec>]
(SyS_ioctl+0x34/0x5c)
[ 78.453918] [<c00ea7ec>] (SyS_ioctl) from [<c000e5a0>]
(ret_fast_syscall+0x0/0x30)

To avoid these I need to set the VRAM default_caching to
TTM_PL_FLAG_UNCACHED. It is not clear to me why this is needed. The BO
being accessed through the BAR, they are correctly considered as IO
memory and mapped using ttm_bo_ioremap(), so it really seems to be
unhappy with the WC mapping itself.

Note that if I go ahead and force the use of pgprot_writecombine() in
ttm_io_prot() to get writecombined user-space mappings, pure DRM
programs that map a buffer and try to read it fail similarly, while
Mesa's glReadPixels() seems to be happy. I'm not sure what it does
differently here.

Cheers,
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/