Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus test)

From: Alan Stern
Date: Wed Jan 25 2023 - 15:47:28 EST


On Wed, Jan 25, 2023 at 11:46:51AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 25, 2023 at 02:08:59PM -0500, Alan Stern wrote:
> > Why do you want the implementation to forbid it? The pattern of the
> > litmus test resembles 3+3W, and you don't care whether the kernel allows
> > that pattern. Do you?
>
> Jonas asked a similar question, so I am answering you both here.
>
> With (say) a release-WRITE_ONCE() chain implementing N+2W for some
> N, it is reasonably well known that you don't get ordering, hardware
> support otwithstanding. After all, none of the Linux kernel, C, and C++
> memory models make that guarantee. In addition, the non-RCU barriers
> and accesses that you can use to create N+2W have been in very wide use
> for a very long time.
>
> Although RCU has been in use for almost as long as those non-RCU barriers,
> it has not been in wide use for anywhere near that long. So I cannot
> be so confident in ruling out some N+2W use case for RCU.
>
> Such a use case could play out as follows:
>
> 1. They try LKMM on it, see that LKMM allows it, and therefore find
> something else that works just as well. This is fine.
>
> 2. They try LKMM on it, see that LKMM allows it, but cannot find
> something else that works just as well. They complain to us,
> and we either show them how to get the same results some other
> way or adjust LKMM (and perhaps the implementations) accordingly.
> These are also fine.
>
> 3. They don't try LKMM on it, see that it works when they test it,
> and they send it upstream. The use case is entangled deeply
> enough in other code that no one spots it on review. The Linux
> kernel unconditionally prohibits the cycle. This too is fine.
>
> 4. They don't try LKMM on it, see that it works when they test it,
> and they send it upstream. The use case is entangled deeply
> enough in other code that no one spots it on review. Because RCU
> grace periods incur tens of microseconds of latency at a minimum,
> all tests (almost) always pass, just due to delays and unrelated
> accesses and memory barriers. Even in kernels built with some
> future SRCU equivalent of CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> But the Linux kernel allows the cycle when there is a new moon
> on Tuesday during a triple solar eclipse of Jupiter, a condition
> that is eventually met, and at the worst possible time and place.
>
> This is absolutely the opposite of fine.
>
> I don't want to deal with #4. So this is an RCU-maintainer use case
> that I would like to avoid. ;-)

Since it is well known that the non-RCU barriers in the Linux kernel, C,
and C++ do not enforce ordering in n+nW, and seeing as how your litmus
test relies on an smp_store_release() at one point, I think it's
reasonable to assume people won't expect it to provide ordering.

Ah, but what about a litmus test that relies solely on RCU?

rcu_read_lock Wy=2 rcu_read_lock Wv=2
Wx=2 synchronize_rcu Wu=2 synchronize_rcu
Wy=1 Wu=1 Wv=1 Wx=1
rcu_read_unlock rcu_read_unlock

exists (x=2 /\ y=2 /\ u=2 /\ v=2)

Luckily, this _is_ forbidden by the LKMM. So I think you're okay.

Alan