Re: [PATCH] riscv/atomic.h: optimize ops with acquire/release ordering

From: Puranjay Mohan
Date: Tue May 07 2024 - 10:07:35 EST


Andrea Parri <parri.andrea@xxxxxxxxx> writes:

> Hi Puranjay,
>
> On Sun, May 05, 2024 at 12:33:40PM +0000, Puranjay Mohan wrote:
>> Currently, atomic ops with acquire or release ordering are implemented
>> as atomic ops with relaxed ordering followed by or preceded by an
>> acquire fence or a release fence.
>>
>> Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
>> Unprivileged ISA", titled, "Specifying Ordering of Atomic Instructions"
>> says:
>>
>> | To provide more efficient support for release consistency [5], each
>> | atomic instruction has two bits, aq and rl, used to specify additional
>> | memory ordering constraints as viewed by other RISC-V harts.
>>
>> and
>>
>> | If only the aq bit is set, the atomic memory operation is treated as
>> | an acquire access.
>> | If only the rl bit is set, the atomic memory operation is treated as a
>> | release access.
>>
>> So, rather than using two instructions (relaxed atomic op + fence), use
>> a single atomic op instruction with acquire/release ordering.
>>
>> Example program:
>>
>> atomic_t cnt = ATOMIC_INIT(0);
>> atomic_fetch_add_acquire(1, &cnt);
>> atomic_fetch_add_release(1, &cnt);
>>
>> Before:
>>
>> amoadd.w a4,a5,(a4) // Atomic add with relaxed ordering
>> fence r,rw // Fence to force Acquire ordering
>>
>> fence rw,w // Fence to force Release ordering
>> amoadd.w a4,a5,(a4) // Atomic add with relaxed ordering
>>
>> After:
>>
>> amoadd.w.aq a4,a5,(a4) // Atomic add with Acquire ordering
>>
>> amoadd.w.rl a4,a5,(a4) // Atomic add with Release ordering
>>
>> Signed-off-by: Puranjay Mohan <puranjay@xxxxxxxxxx>
>
> Your changes are effectively partially reverting:
>
> 5ce6c1f3535fa ("riscv/atomic: Strengthen implementations with fences")
>
> Can you please provide (and possibly include in the changelog of v2) a more
> thoughtful explanation for the correctness of such revert?
>
> (Anticipating a somewhat non-trivial analysis...)

Hi Andrea,

I think Guo Ren sent a patch[1] like this earlier but it did not get
comments yet. I will reply on that thread[1] as well.

I saw the commit 5ce6c1f3535fa ("riscv/atomic: Strengthen
implementations with fences") and all the related discussions.

This is what I understand from the discussions:

RISCV's LR.aq/SC.rl were following RCpc ordering but the LKMM expected
RCsc ordering from lock() and unlock(). So you added fences to force RCsc
ordering in the locks/atomics.

An experiment with LKMM and RISCV MM:

The following litmus test should not reach (1:r0=1 /\ 1:r1=0) with LKMM:

C unlock-lock-read-ordering

{}
/* s initially owned by P1 */

P0(int *x, int *y)
{
WRITE_ONCE(*x, 1);
smp_wmb();
WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, spinlock_t *s)
{
int r0;
int r1;

r0 = READ_ONCE(*y);
spin_unlock(s);
spin_lock(s);
r1 = READ_ONCE(*x);
}

exists (1:r0=1 /\ 1:r1=0)

Which is indeed true:

Test unlock-lock-read-ordering Allowed
States 3
1:r0=0; 1:r1=0;
1:r0=0; 1:r1=1;
1:r0=1; 1:r1=1;
No
Witnesses
Positive: 0 Negative: 3
Flag unmatched-unlock
Condition exists (1:r0=1 /\ 1:r1=0)
Observation unlock-lock-read-ordering Never 0 3
Time unlock-lock-read-ordering 0.01
Hash=ab0cfdcde54d1bb1faa731533980f424

And when I map this test to RISC-V:

RISCV RISCV-unlock-lock-read-ordering
{
0:x2=x;
0:x4=y;

1:x2=x;
1:x4=y;
1:x6=s;
}
P0 | P1 ;
ori x1,x0,1 | lw x1,0(x4) ;
sw x1,0(x2) | amoswap.w.rl x0,x0,(x6) ;
fence w,w | ori x5,x0,1 ;
ori x3,x0,1 | amoswap.w.aq x0,x5,(x6) ;
sw x3,0(x4) | lw x3,0(x2) ;
exists (1:x1=1 /\ 1:x3=0)

This also doesn't reach the condition:

Test RISCV-unlock-lock-read-ordering Allowed
States 3
1:x1=0; 1:x3=0;
1:x1=0; 1:x3=1;
1:x1=1; 1:x3=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (1:x1=1 /\ 1:x3=0)
Observation RISCV-unlock-lock-read-ordering Never 0 3
Time RISCV-unlock-lock-read-ordering 0.01
Hash=d845d36e2a8480165903870d135dd81e

Your commit mentioned that the above test would reach the exists
condition for RISCV.

So, maybe the model has been modified to make .aq and .rl RCsc now?

This presentation[2] by Dan Lustig says on page 31:

| PPO RULES 5-7
| A release that precedes an acquire in program
| order also precedes it in global memory order
| • i.e., the RCsc variant of release consistency

If above is true, removing the weak fences and using LR, SC, AMOs with
aq, rl, and aqrl bits could be used in the kernel AMOs and locks.

[1] https://lore.kernel.org/all/20220420144417.2453958-3-guoren@xxxxxxxxxx/
[2] https://riscv.org/wp-content/uploads/2018/05/14.25-15.00-RISCVMemoryModelTutorial.pdf

Thanks,
Puranjay