Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers

From: Leo Yan
Date: Mon May 31 2021 - 13:05:50 EST


Hi Peter, Adrian,

On Thu, May 27, 2021 at 11:57:37AM +0200, Peter Zijlstra wrote:
> On Thu, May 27, 2021 at 12:24:15PM +0300, Adrian Hunter wrote:
>
> > > If all we want is a compiler barrier, then shouldn't that be what we use?
> > > i.e. barrier()

Sorry for a bit late. Just bring up one question before I respin
this patch set.

> > I guess you are saying we still need to stop potential re-ordering across
> > CPUs, so please ignore my comments.
>
> Right; so the ordering issue is real, consider:
>
> CPU0 (kernel) CPU1 (user)
>
> write data read head
> smp_wmb() smp_rmb()
> write head read data

One thing should be mentioned is the Linux kernel has _not_ used an
explict "smb_wmb()" between writing AUX trace data and updating header
"aux_head". Please see the function perf_aux_output_end():

void perf_aux_output_end(..., size)
{
...

if (size || (handle->aux_flags & ~(u64)PERF_AUX_FLAG_OVERWRITE))
perf_event_aux_event(handle->event, aux_head, size,
handle->aux_flags);

WRITE_ONCE(rb->user_page->aux_head, rb->aux_head);

...
}

But I think it's needless to add "smb_wmb()" prior to WRITE_ONCE()
sentence. This is because:

Before updating the "aux_head", it calls perf_event_aux_event(), so
event PERF_RECORD_AUX is filled into the perf ring buffer, and executes
smb_wmb() + updates the header "user_page->data_head"; so the flow is
like blow:

Fill AUX trace data to AUX ring buffer
Fill RECORD_AUX event into perf ring buffer
smb_wmb()
update "user_page->data_head" -> See perf_event_aux_event()/perf_output_end()
update "user_page->aux_head"

This is a bit weird for two ring buffers (AUX and perf generic ring
buffers) share the same memory barrier between the write data and
write headers.

Do you think I understand correctly? Or should add an explict
"smb_wmb()" before WRITE_ONCE(rb->user_page->aux_head, ...)?

Thanks,
Leo

> Without explicit ordering (on either side), we might either read data
> that isn't written yet:
>
> ,--(read data)
> write data |
> smp_wmb() |
> write head ---. |
> `--> | read head
> `- read data
>
> Where the head load observes the new head writte, but the data load is
> speculated and loads data before it is written.
>
> Or, we can write the head before the data write is visible:
>
> ,-- write data
> | write head
> | read head
> | smp_rmb()
> | read data
> `-> (data visible)
>
> Where we read the head head, but still observe stale data because the
> stores got committed out of order.
>
> x86 is TSO, so neither reordering is actually possible, hence both
> barriers being a compiler barrier (to ensure the compiler doesn't
> reorder them for us). But weaker hardware *will* allow those orderings
> and we very much need actual barriers there.
>
> Welcome to the magical world of memory ordering and weak architectures.