Re: Do we need to correct barriering in circular-buffers.rst?

From: Andrea Parri
Date: Fri Sep 27 2019 - 05:51:19 EST

Next message: Jian Hu: "Re: [PATCH 2/2] clk: meson: a1: add support for Amlogic A1 clock driver"
Previous message: Steven Rostedt: "[PATCH] tracing/probe: Test nr_args match in looking for same probe events"
Next in thread: Peter Zijlstra: "Re: Do we need to correct barriering in circular-buffers.rst?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Sep 23, 2019 at 04:49:31PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 19, 2019 at 02:59:06PM +0100, David Howells wrote:
>
> > But I don't agree with this. You're missing half the barriers. There should
> > be *four* barriers. The document mandates only 3 barriers, and uses
> > READ_ONCE() where the fourth should be, i.e.:
> >
> > thread #1 thread #2
> >
> > smp_load_acquire(head)
> > ... read data from queue ..
> > smp_store_release(tail)
> >
> > READ_ONCE(tail)
> > ... add data to queue ..
> > smp_store_release(head)
> >
>
> Notably your READ_ONCE() pseudo code is lacking a conditional;
> kernel/events/ring_buffer.c writes it like so:
>
> * kernel user
> *
> * if (LOAD ->data_tail) { LOAD ->data_head
> * (A) smp_rmb() (C)
> * STORE $data LOAD $data
> * smp_wmb() (B) smp_mb() (D)
> * STORE ->data_head STORE ->data_tail
> * }
> *
> * Where A pairs with D, and B pairs with C.
> *
> * In our case (A) is a control dependency that separates the load of
> * the ->data_tail and the stores of $data. In case ->data_tail
> * indicates there is no room in the buffer to store $data we do not.

To elaborate on this, dependencies are tricky... ;-)

For the record, the LKMM doesn't currently model "order" derived from
control dependencies to a _plain_ access (even if the plain access is
a write): in particular, the following is racy (as far as the current
LKMM is concerned):

C rb

{ }

P0(int *tail, int *data, int *head)
{
if (READ_ONCE(*tail)) {
*data = 1;
smp_wmb();
WRITE_ONCE(*head, 1);
}
}

P1(int *tail, int *data, int *head)
{
int r0;
int r1;

r0 = READ_ONCE(*head);
smp_rmb();
r1 = *data;
smp_mb();
WRITE_ONCE(*tail, 1);
}

Replacing the plain "*data = 1" with "WRITE_ONCE(*data, 1)" (or doing
s/READ_ONCE(*tail)/smp_load_acquire(tail)) suffices to avoid the race.
Maybe I'm short of imagination this morning... but I can't currently
see how the compiler could "break" the above scenario.

I also didn't spend much time thinking about it. memory-barriers.txt
has a section "CONTROL DEPENDENCIES" dedicated to "alerting developers
using control dependencies for ordering". That's quite a long section
(and probably still incomplete); the last paragraph summarizes: ;-)

(*) Compilers do not understand control dependencies. It is therefore
your job to ensure that they do not break your code.

Andrea

> *
> * D needs to be a full barrier since it separates the data READ
> * from the tail WRITE.
> *
> * For B a WMB is sufficient since it separates two WRITEs, and for C
> * an RMB is sufficient since it separates two READs.
>
> Where 'kernel' is the producer and 'user' is the consumer. This was
> written before load-acquire and store-release came about (I _think_),
> and I've so far resisted updating B to store-release because smp_wmb()
> is actually cheaper than store-release on a number of architectures
> (notably ARM).
>
> C ought to be a load-aquire, and D really should be a store-release, but
> I don't think the perf userspace has that (or uses C11).

Next message: Jian Hu: "Re: [PATCH 2/2] clk: meson: a1: add support for Amlogic A1 clock driver"
Previous message: Steven Rostedt: "[PATCH] tracing/probe: Test nr_args match in looking for same probe events"
Next in thread: Peter Zijlstra: "Re: Do we need to correct barriering in circular-buffers.rst?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]