Re: [PATCH 0/4] arm64: Support the TSO memory model

From: Neal Gompa
Date: Wed Apr 10 2024 - 21:39:08 EST


On Wed, Apr 10, 2024 at 8:51 PM Hector Martin <marcan@xxxxxxxxx> wrote:
>
> x86 CPUs implement a stricter memory modern than ARM64 (TSO). For this
> reason, x86 emulation on baseline ARM64 systems requires very expensive
> memory model emulation. Having hardware that supports this natively is
> therefore very attractive. Such hardware, in fact, exists. This series
> adds support for userspace to identify when TSO is available and
> toggle it on, if supported.
>
> Some ARM64 CPUs intrinsically implement the TSO memory model, while
> others expose is as an IMPDEF control. Apple Silicon SoCs are in the
> latter category. Using TSO for x86 emulation on chips that support it
> has been shown to provide a massive performance boost [1].
>
> Patch 1 introduces the PR_{SET,GET}_MEM_MODEL userspace control, which
> is initially not implemented for any architectures.
>
> Patch 2 implements it for CPUs which are known, to the best of my
> knowledge, to always implement the TSO memory model unconditionally.
> This uses the cpufeature mechanism to only enable this if *all* cores in
> the system meet the requirements.
>
> Patch 3 adds the scaffolding necesasry to save/restore the ACTLR_EL1
> register across context switches. This register contains IMPDEF flags
> related to CPU execution, and on Apple CPUs this is where the runtime
> TSO toggle bit is implemented. Other CPUs could conceivably benefit from
> this scaffolding if they also use ACTLR_EL1 for things that could
> ostensibly be runtime controlled and context-switched. For this to work,
> ACTLR_EL1 must have a uniform layout across all cores in the system.
>
> Finally, patch 4 implements PR_{SET,GET}_MEM_MODEL for Apple CPUs by
> hooking it up to flip the appropriate ACTLR_EL1 bit when the Apple TSO
> feature is detected (on all CPUs, which also implies the uniform
> ACTLR_EL1 layout).
>
> This series has been brewing in the downstream Asahi Linux tree for a
> while now, and ships to thousands of users. A subset have been using it
> with FEX-Emu, which already supports this feature. This rebase on
> v6.9-rc1 is only build-tested (all intermediate commits with and without
> the config enabled, on ARM64) but I'll update the downstream branch soon
> with this version and get it pushed out to users/testers.
>
> The Apple support works on bare metal and *should* work exactly the same
> way on macOS VMs (as alluded to by Zayd in his independent submission [3]),
> though I haven't personally verified this. KVM support for this is left
> for a future patchset.
>
> (Apologies for the large Cc: list; I want to make sure nobody who got
> Cced on Zayd's alternate take is left out of this one.)
>
> [1] https://fex-emu.com/FEX-2306/
> [2] https://github.com/AsahiLinux/linux/tree/bits/220-tso
> [3] https://lore.kernel.org/lkml/20240410211652.16640-1-zayd_qumsieh@xxxxxxxxx/
>
> To: Catalin Marinas <catalin.marinas@xxxxxxx>
> To: Will Deacon <will@xxxxxxxxxx>
> To: Marc Zyngier <maz@xxxxxxxxxx>
> To: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Zayd Qumsieh <zayd_qumsieh@xxxxxxxxx>
> Cc: Justin Lu <ih_justin@xxxxxxxxx>
> Cc: Ryan Houdek <Houdek.Ryan@xxxxxxxxxxx>
> Cc: Mark Brown <broonie@xxxxxxxxxx>
> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
> Cc: Mateusz Guzik <mjguzik@xxxxxxxxx>
> Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx>
> Cc: Oliver Upton <oliver.upton@xxxxxxxxx>
> Cc: Miguel Luis <miguel.luis@xxxxxxxxxx>
> Cc: Joey Gouly <joey.gouly@xxxxxxx>
> Cc: Christoph Paasch <cpaasch@xxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Sami Tolvanen <samitolvanen@xxxxxxxxxx>
> Cc: Baoquan He <bhe@xxxxxxxxxx>
> Cc: Joel Granados <j.granados@xxxxxxxxxxx>
> Cc: Dawei Li <dawei.li@xxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Florent Revest <revest@xxxxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Stefan Roesch <shr@xxxxxxxxxxxx>
> Cc: Andy Chiu <andy.chiu@xxxxxxxxxx>
> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> Cc: Helge Deller <deller@xxxxxx>
> Cc: Zev Weiss <zev@xxxxxxxxxxxxxxxxx>
> Cc: Ondrej Mosnacek <omosnace@xxxxxxxxxx>
> Cc: Miguel Ojeda <ojeda@xxxxxxxxxx>
> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: Asahi Linux <asahi@xxxxxxxxxxxxxxx>
>
> Signed-off-by: Hector Martin <marcan@xxxxxxxxx>
> ---
> Hector Martin (4):
> prctl: Introduce PR_{SET,GET}_MEM_MODEL
> arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs
> arm64: Introduce scaffolding to add ACTLR_EL1 to thread state
> arm64: Implement Apple IMPDEF TSO memory model control
>
> arch/arm64/Kconfig | 14 ++++++
> arch/arm64/include/asm/apple_cpufeature.h | 15 +++++++
> arch/arm64/include/asm/cpufeature.h | 10 +++++
> arch/arm64/include/asm/processor.h | 3 ++
> arch/arm64/kernel/Makefile | 3 +-
> arch/arm64/kernel/cpufeature.c | 11 ++---
> arch/arm64/kernel/cpufeature_impdef.c | 61 ++++++++++++++++++++++++++
> arch/arm64/kernel/process.c | 71 +++++++++++++++++++++++++++++++
> arch/arm64/kernel/setup.c | 8 ++++
> arch/arm64/tools/cpucaps | 2 +
> include/linux/memory_ordering_model.h | 11 +++++
> include/uapi/linux/prctl.h | 5 +++
> kernel/sys.c | 21 +++++++++
> 13 files changed, 229 insertions(+), 6 deletions(-)
> ---
> base-commit: 4cece764965020c22cff7665b18a012006359095
> change-id: 20240411-tso-e86fdceb94b8
>

The series looks good to me.

Reviewed-by: Neal Gompa <neal@xxxxxxxxx>



--
真実はいつも一つ!/ Always, there's only one truth!