Re: perf_event wakeup_events = 0

From: Theodore Dubois
Date: Sat Sep 07 2019 - 19:29:25 EST


On Sep 7, 2019, at 3:45 PM, Valdis KlÄtnieks <valdis.kletnieks@xxxxxx> wrote:

> So an entry is made in the buffer. It's not clear that this immediately triggers
> a signalâ

I think the documentation says it does when wakeup_events is 1. The code for
perf backs this up:
https://github.com/torvalds/linux/blob/a9815a4fa2fd297cab9fa7a12161b16657290293/tools/perf/util/evsel.c#L1051-L1054
The puzzle is what happens when wakeup_events is 0. The documentation saying
"more recent kernels treat 0 the same as 1" suggests it should behave the same,
but then why would perf set it to 1 after zero-initializing it?

> So you need to look at what size mmap buffer is being allocated. It's *probably*
> on the order of megabytes, so that you can buffer a fairly large number of entries
> and not take several user/kernel transitions on every single entryâ

Itâs 512 KiB. Each sample is 40 bytes (the sample_type is IP | TID | TIME |
PERIOD, and each one of those 8 bytes). 40 bytes per sample * 4000 samples per
second * 1.637 seconds is 261920 which is almost exactly half the buffer.

So does wakeup_events = 0 means it causes a wakeup when the buffer is half
full? I don't see anything in the man page about this....

If you'd like to try yourself, this is the strace command I've been using:
strace -ttTv -eperf_event_open,mmap,poll -operf.strace perf record stress --cpu 1 --timeout 1

~Theodore

>
> On Sat, 07 Sep 2019 09:14:49 -0700, Theodore Dubois said:
>
> Reading what it actually says rather than what I thought it said.. :)
>
> Events come in two flavors: counting and sampled. A counting event is
> one that is used for counting the aggregate number of events that
> occur. In general, counting event results are gathered with a read(2)
> call. A sampling event periodically writes measurements to a buffer
> that can then be accessed via mmap(2).
>
> For some reason, I was thinking counting events. -ENOCAFFEINE. :)
>
>> sample_freq is 4000 (and freq is 1). Hereâs the man page on this field:
>>
>> sample_period, sample_freq
>> A "sampling" event is one that generates an overflow notificaâ
>> tion every N events, where N is given by sample_period. A samâ
>> pling event has sample_period > 0.
>
> There's this part:
>> pling event has sample_period > 0. When an overflow occurs,
>> requested data is recorded in the mmap buffer. The sample_type
>> field controls what data is recorded on each overflow.
>
> So an entry is made in the buffer. It's not clear that this immediately triggers
> a signal...
>
> MMAP layout
> When using perf_event_open() in sampled mode, asynchronous events (like
> counter overflow or PROT_EXEC mmap tracking) are logged into a ring-
> buffer. This ring-buffer is created and accessed through mmap(2).
>
> The mmap size should be 1+2^n pages, where the first page is a metadata
> page (struct perf_event_mmap_page) that contains various bits of infor?
> mation such as where the ring-buffer head is.
>
> So you need to look at what size mmap buffer is being allocated. It's *probably*
> on the order of megabytes, so that you can buffer a fairly large number of entries
> and not take several user/kernel transitions on every single entry...
>
>> If Iâm reading this right, this is a sampling event which overflows 4000 times a second.
>
> And 4,000 entries are made in the buffer per second..
>
>> But perf then does a poll call which wakes up on this FD with POLLIN after
>> 1.637 seconds, instead of 0.00025 seconds
>
> At which point perf goes and looks at several thousand entries in the ring buffer...