Re: [PATCH v8 0/8] Add support for SVM atomics in Nouveau

From: Alistair Popple
Date: Thu May 06 2021 - 03:44:05 EST


Hi Andrew,

There is currently no outstanding feedback for this series so I am hoping it
may be considered for inclusion (or at least the mm portions - I still need
Reviews/Acks for the Nouveau bits). The main change for v8 was removal of
entries on fork rather than copying in response to feedback from Jason so any
follow up comments on patch 5 would also be welcome. The series contains a
number of general clean-ups suggested by Christoph along with a feature to
temporarily make selected user page mappings write-protected.

This is needed to support OpenCL atomic operations in Nouveau to shared
virtual memory (SVM) regions allocated with the CL_MEM_SVM_ATOMICS clSVMAlloc
flag. A more complete description of the OpenCL SVM feature is available at
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/
OpenCL_API.html#_shared_virtual_memory .

I have been testing this with Mesa 21.1.0 and a simple OpenCL program which
checks GPU atomic accesses to system memory are atomic. Without this series
the test fails as there is no way of write-protecting the userspace page
mapping which results in the device clobbering CPU writes. For reference the
test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/ .

- Alistair

On Wednesday, 7 April 2021 6:42:30 PM AEST Alistair Popple wrote:
> This is the eighth version of a series to add support to Nouveau for atomic
> memory operations on OpenCL shared virtual memory (SVM) regions.
>
> The main change for this version is a simplification of device exclusive
> entry handling. Instead of copying entries for copy-on-write mappings
> during fork they are removed instead. This is safer because there could be
> unique corner cases when copying, particularly for pinned pages which
> should follow the same logic as copy_present_page(). Removing entries
> avoids this possiblity by treating them as normal ptes.
>
> Exclusive device access is implemented by adding a new swap entry type
> (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main
> difference is that on fault the original entry is immediately restored by
> the fault handler instead of waiting.
>
> Restoring the entry triggers calls to MMU notifers which allows a device
> driver to revoke the atomic access permission from the GPU prior to the CPU
> finalising the entry.
>
> Patches 1 & 2 refactor existing migration and device private entry
> functions.
>
> Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
> functionality into separate functions - try_to_migrate_one() and
> try_to_munlock_one(). These should not change any functionality, but any
> help testing would be much appreciated as I have not been able to test
> every usage of try_to_unmap_one().
>
> Patch 5 contains the bulk of the implementation for device exclusive
> memory.
>
> Patch 6 contains some additions to the HMM selftests to ensure everything
> works as expected.
>
> Patch 7 is a cleanup for the Nouveau SVM implementation.
>
> Patch 8 contains the implementation of atomic access for the Nouveau
> driver.
>
> This has been tested using the latest upstream Mesa userspace with a simple
> OpenCL test program which checks the results of atomic GPU operations on a
> SVM buffer whilst also writing to the same buffer from the CPU.
>
> Alistair Popple (8):
> mm: Remove special swap entry functions
> mm/swapops: Rework swap entry manipulation code
> mm/rmap: Split try_to_munlock from try_to_unmap
> mm/rmap: Split migration into its own function
> mm: Device exclusive memory access
> mm: Selftests for exclusive device memory
> nouveau/svm: Refactor nouveau_range_fault
> nouveau/svm: Implement atomic SVM access
>
> Documentation/vm/hmm.rst | 19 +-
> Documentation/vm/unevictable-lru.rst | 33 +-
> arch/s390/mm/pgtable.c | 2 +-
> drivers/gpu/drm/nouveau/include/nvif/if000c.h | 1 +
> drivers/gpu/drm/nouveau/nouveau_svm.c | 156 ++++-
> drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h | 1 +
> .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 6 +
> fs/proc/task_mmu.c | 23 +-
> include/linux/mmu_notifier.h | 26 +-
> include/linux/rmap.h | 11 +-
> include/linux/swap.h | 8 +-
> include/linux/swapops.h | 123 ++--
> lib/test_hmm.c | 126 +++-
> lib/test_hmm_uapi.h | 2 +
> mm/debug_vm_pgtable.c | 12 +-
> mm/hmm.c | 12 +-
> mm/huge_memory.c | 45 +-
> mm/hugetlb.c | 10 +-
> mm/memcontrol.c | 2 +-
> mm/memory.c | 196 +++++-
> mm/migrate.c | 51 +-
> mm/mlock.c | 10 +-
> mm/mprotect.c | 18 +-
> mm/page_vma_mapped.c | 15 +-
> mm/rmap.c | 612 +++++++++++++++---
> tools/testing/selftests/vm/hmm-tests.c | 158 +++++
> 26 files changed, 1366 insertions(+), 312 deletions(-)
>
>