Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next

From: Peter Zijlstra
Date: Wed Jan 29 2020 - 13:40:40 EST


On Wed, Jan 29, 2020 at 04:29:43PM +0100, Marco Elver wrote:

> On Tue, 28 Jan 2020 at 17:52, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > I'm claiming that in the first case, the only thing that's ever done
> > with a racy load is comparing against 0, there is no possible bad
> > outcome ever. While obviously if you let the load escape, or do anything
> > other than compare against 0, there is.
>
> It might sound like a simple rule, but implementing this is anything
> but simple: This would require changing the compiler,

Right.

> which we said we'd like to avoid as it introduces new problems.

Ah, I missed that brief.

> This particular rule relies on semantic analysis that is beyond what
> the TSAN instrumentation currently supports. Right now we support GCC
> and Clang; changing the compiler probably means we'd end up with only
> one (probably Clang), and many more years before the change has
> propagated to the majority of used compiler versions. It'd be good if
> we can do this purely as a change in the kernel's codebase.

*sigh*, I didn't know there was such a resistance to change the tooling.
That seems very unfortunate :-/

> Keeping the bigger picture in mind, how frequent is this case, and
> what are we really trying to accomplish?

It's trying to avoid the RmW pulling the line in exclusive/modified
state in a loop. The basic C-CAS pattern if you will.

> Is it only to avoid a READ_ONCE? Why is the READ_ONCE bad here? If
> there is a racing access, why not be explicit about it?

It's probably not terrible to put a READ_ONCE() there; we just need to
make sure the compiler doesn't do something stupid (it is known to do
stupid when 'volatile' is present).

But the fact remains that it is entirely superfluous, there is no
possible way the compiler can wreck this.