Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

From: Linus Torvalds
Date: Tue Feb 02 2016 - 13:06:41 EST


On Tue, Feb 2, 2016 at 9:51 AM, Will Deacon <will.deacon@xxxxxxx> wrote:
>
> Given that the vast majority of weakly ordered architectures respect
> address dependencies, I would expect all of them to be hurt if they
> were forced to use barrier instructions instead, even those where the
> microarchitecture is fairly strongly ordered in practice.

I do wonder if it would be all that noticeable, though. I don't think
we've really had benchmarks.

For example, most of the RCU list traversal shows up on x86 - where
loads are already acquires. But they show up not because of that, but
because a RCU list traversal is pretty much always going to take the
cache miss.

So it would actually be interesting to just try it - what happens to
kernel-centric benchmarks (which are already fairly rare) on arm if we
change the rcu_dereference() to be a smp_load_acquire()?

Because maybe nothing happens at all. I don't think we've ever tried it.

> As far as I understand it, the problems with "consume" have centred
> largely around compiler and specification issues, which we don't have
> with rcu_dereference (i.e. we ignore thin-air and use volatile casts
> /barrier() to keep the optimizer at bay).

Oh, I agree. The C++ consume orderings have been different from the
kernel worries.

But if it turns out that we have situations where we lose transitivity
because of rcu_dereference not being an acquire, then we have kernel
problems.

I do see that your later email said that the pointer dependency (which
we assume in rcu) should retain the transitivity, so maybe there is no
real reason to strengthen things.

But I _would_ argue that transitivity is so important (because of the
whole "individual orderings make sense and are causal, so the end
result must make sense and be causal") that if that were to break,
then such an architecture really should just strengthen the orderings.

Because the *worst* kinds of bugs are exactly the ones where the code
makes sense and works locally, but then the combination of two or
three pieces of code that are individually sensible ends up not
working for some crazy non-transitivity reason. That really is not
something that people can cope with.

And maybe we will one day have widely available automated ordering
proofs that work across the whole kernel, and we don't even need to
worry about "this breaks peoples minds", but I don't think we are
there yet.

Linus