Re: [PATCH v2 00/15] Introduce threaded trace streaming for basic perf record operation

From: Alexey Budankov
Date: Mon Oct 26 2020 - 13:59:15 EST



On 24.10.2020 18:43, Jiri Olsa wrote:
> On Wed, Oct 21, 2020 at 06:52:43PM +0300, Alexey Budankov wrote:
>>
>> Changes in v2:
>> - explicitly added credit tags to patches 6/15 and 15/15,
>> additionally to cites [1], [2]
>> - updated description of 3/15 to explicitly mention the reason
>> to open data directories in read access mode (e.g. for perf report)
>> - implemented fix for compilation error of 2/15
>> - explicitly elaborated on found issues to be resolved for
>> threaded AUX trace capture
>>
>> v1: https://lore.kernel.org/lkml/810f3a69-0004-9dff-a911-b7ff97220ae0@xxxxxxxxxxxxxxx/
>>
>> Patch set provides threaded trace streaming for base perf record
>> operation. Provided streaming mode (--threads) mitigates profiling
>> data losses and resolves scalability issues of serial and asynchronous
>> (--aio) trace streaming modes on multicore server systems. The patch
>> set is based on the prototype [1], [2] and the most closely relates
>> to mode 3) "mode that creates thread for every monitored memory map".
>
> so what I liked about the previous code was that you could
> configure how the threads would be created
>
> default --threads options created thread for each cpu like
> in your change:
>
> $ perf record -v --threads ...
> ...
> thread 0 monitor: 0 allowed: 0
> thread 1 monitor: 1 allowed: 1
> thread 2 monitor: 2 allowed: 2
> thread 3 monitor: 3 allowed: 3
> thread 4 monitor: 4 allowed: 4
> thread 5 monitor: 5 allowed: 5
> thread 6 monitor: 6 allowed: 6
> thread 7 monitor: 7 allowed: 7

Yes, it is configurable in the prototype. Even though this patch set
doesn't implement that parameters for --thread option, just because
VTune doesn't have use cases for that yet, it has still been designed
and implemented with that possible extension in mind so it could then
be easily added on top of it.

>
>
> then numa based:
>
> $ perf record -v --threads=numa ...
> ...
> thread 0 monitor: 0-5,12-17 allowed: 0-5,12-17
> thread 1 monitor: 6-11,18-23 allowed: 6-11,18-23
>
>
> socket based:
>
> $ perf record -v --threads=socket ...
> ...
> thread 0 monitor: 0-7 allowed: 0-7
>
>
> core based:
>
> $ perf record -v --threads=core ...
> ...
> thread 0 monitor: 0,4 allowed: 0,4
> thread 1 monitor: 1,5 allowed: 1,5
> thread 2 monitor: 2,6 allowed: 2,6
> thread 3 monitor: 3,7 allowed: 3,7
>
>
> and user configurable:
>
> $ perf record -v --threads=0-3/0:4-7/4 ...
> ...
> threads: 0. monitor 0-3, allowed 0
> threads: 1. monitor 4-7, allowed 4
>
>
> so this way you could easily pin threads to cpu/core/socket/numa,
> or to some other cpu of your choice, because this will be always
> game of try and check where I'm not getting LOST events and not
> creating 1000 threads
>
> perf record: Add support for threads numa option value
> perf record: Add support for threads socket option value
> perf record: Add support for threads core option value
> perf record: Add support for threads user option value

Makes sense.

Alexei