Re: [RFC] Potential problem in qspinlock due to mixed-size accesses

From: Thomas Haas
Date: Tue Jun 17 2025 - 10:35:53 EST

Next message: Timothy Pearson: "[PATCH v7] PCI: Add pcie_link_is_active() to determine if the PCIe link"
Previous message: Naveen N Rao: "Re: [PATCH v3 12/62] KVM: SVM: Inhibit AVIC if ID is too big instead of rejecting vCPU creation"
In reply to: Will Deacon: "Re: [RFC] Potential problem in qspinlock due to mixed-size accesses"
Next in thread: Hernan Ponce de Leon: "Re: [RFC] Potential problem in qspinlock due to mixed-size accesses"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 17.06.25 16:17, Will Deacon wrote:

On Tue, Jun 17, 2025 at 10:42:04AM +0200, Hernan Ponce de Leon wrote:

On 6/17/2025 8:19 AM, Thomas Haas wrote:

On 16.06.25 16:23, Will Deacon wrote:

I'm half inclined to think that the Arm memory model should be tightened
here; I can raise that with Arm and see what they say.

Although the cited paper does give examples of store-forwarding from a
narrow store to a wider load, the case in qspinlock is further
constrained by having the store come from an atomic rmw and the load
having acquire semantics. Setting aside the MSA part, that specific case
_is_ ordered in the Arm memory model (and C++ release sequences rely on
it iirc), so it's fair to say that Arm CPUs don't permit forwarding from
an atomic rmw to an acquire load.

Given that, I don't see how this is going to occur in practice.

You are probably right. The ARM model's atomic-ordered-before relation

let aob = rmw | [range(rmw)]; lrs; [A | Q]

clearly orders the rmw-store with subsequent acquire loads (lrs = local-
read-successor, A = acquire).
If we treat this relation (at least the second part) as a "global
ordering" and extend it by "si" (same-instruction), then the problematic
reordering under MSA should be gone.
I quickly ran Dartagnan on the MSA litmus tests with this change to the
ARM model and all the tests still pass.

Even with this change I still get violations (both safety and termination)
for qspinlock with dartagnan.

Please can you be more specific about the problems you see?

I talked to Hernán personally and it turned out that he used the generic implementation of smp_cond_acquire (not sure if the name is correct) which uses a relaxed load followed by a barrier. In that case, replacing aob by aob;si does not change anything.
Indeed, even in the reported problem we used the generic implementation (I was unaware of this), though it is easy to check that changing the relaxed load to acquire does not give sufficient orderings.

I think the problem is actually with the Internal visibility axiom, because
only making that one stronger seems to remove the violations.

That sounds surprising to me, as there's nothing particularly weird about
Arm's coherence requirements when compared to other architectures, as far
as I'm aware.

I agree. The internal visibility axiom is not the issue I think.

--
=====================================

Thomas Haas

Technische Universität Braunschweig
Institut für Theoretische Informatik
Mühlenpfordtstr. 23, Raum IZ 343
38106 Braunschweig | Germany

t.haas@xxxxxxxxxxxxxxxxxx
https://www.tu-braunschweig.de/tcs/team/thomas-haas

Next message: Timothy Pearson: "[PATCH v7] PCI: Add pcie_link_is_active() to determine if the PCIe link"
Previous message: Naveen N Rao: "Re: [PATCH v3 12/62] KVM: SVM: Inhibit AVIC if ID is too big instead of rejecting vCPU creation"
In reply to: Will Deacon: "Re: [RFC] Potential problem in qspinlock due to mixed-size accesses"
Next in thread: Hernan Ponce de Leon: "Re: [RFC] Potential problem in qspinlock due to mixed-size accesses"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]