Re: [PATCH v10] perf: Sharing PMU counters across compatible events

From: Peter Zijlstra
Date: Fri Feb 28 2020 - 04:46:34 EST


On Fri, Feb 28, 2020 at 10:36:04AM +0100, Peter Zijlstra wrote:
> +
> + /*
> + * Flip an active event to a new master; this is tricky because
> + * for an active event event_pmu_read() can be called at any
> + * time from NMI context.
> + *
> + * This means we need to have ->dup_master and
> + * ->dup_count consistent at all times. Of course we cannot do
> + * two writes at once :/
> + *
> + * Instead, flip ->dup_master to EVENT_TOMBSTONE, this will
> + * make event_pmu_read_dup() NOP. Then we can set
> + * ->dup_count and finally set ->dup_master to the new_master
> + * to let event_pmu_read_dup() rip.
> + */
> + WRITE_ONCE(tmp->dup_master, EVENT_TOMBSTONE);
> + barrier();
> +
> + count = local64_read(&new_master->count);
> + local64_set(&tmp->dup_count, count);
> +
> + if (tmp == new_master)
> + local64_set(&tmp->master_count, count);
> +
> + barrier();
> + WRITE_ONCE(tmp->dup_master, new_master);
> dup_count++;

> @@ -4453,12 +4484,14 @@ static void __perf_event_read(void *info
>
> static inline u64 perf_event_count(struct perf_event *event)
> {
> - if (event->dup_master == event) {
> - return local64_read(&event->master_count) +
> - atomic64_read(&event->master_child_count);
> - }
> + u64 count;
>
> - return local64_read(&event->count) + atomic64_read(&event->child_count);
> + if (likely(event->dup_master != event))
> + count = local64_read(&event->count);
> + else
> + count = local64_read(&event->master_count);
> +
> + return count + atomic64_read(&event->child_count);
> }
>
> /*

One thing that I've failed to mention so far (but has sorta been implied
if you thought carefully) is that ->dup_master and ->master_count also
need to be consistent at all times. Even !ACTIVE events can have
perf_event_count() called on them.

Worse; I just realize that perf_event_count() is called remotely, so we
need SMP ordering between reading ->dup_master and ->master_count
*groan*....