Re: [RFC PATCH 00/10] perf: add workqueue library and use it in synthetic-events
From: Riccardo Mancini
Date: Thu Jul 22 2021 - 12:16:19 EST
Hi Jiri,
On Mon, 2021-07-19 at 23:13 +0200, Jiri Olsa wrote:
> On Tue, Jul 13, 2021 at 02:11:11PM +0200, Riccardo Mancini wrote:
> > This patchset introduces a new utility library inside perf/util, which
> > provides a work queue abstraction, which loosely follows the Kernel
> > workqueue API.
> >
> > The workqueue abstraction is made up by two components:
> > - threadpool: which takes care of managing a pool of threads. It is
> > inspired by the prototype for threaded trace in perf-record from Alexey:
> >
> > https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@xxxxxxxxxxxxxxx/
> > - workqueue: manages a shared queue and provides the workers
> > implementation.
> >
> > On top of the workqueue, a simple parallel-for utility is implemented
> > which is then showcased in synthetic-events.c, replacing the previous
> > manual pthread-created threads.
> >
> > Through some experiments with perf bench, I can see how the new
> > workqueue has a higher overhead compared to manual creation of threads,
> > but is able to more effectively partition work among threads, yielding
> > a better result with more threads.
> > Furthermore, the overhead could be configured by changing the
> > `work_size` (currently 1), aka the number of dirents that are
> > processed by a thread before grabbing a lock to get the new work item.
> > I experimented with different sizes but, while bigger sizes reduce overhead
> > as expected, they do not scale as well to more threads.
> >
> > I tried to keep the patchset as simple as possible, deferring possible
> > improvements and features to future work.
> > Naming a few:
> > - in order to achieve a better performance, we could consider using
> > work-stealing instead of a common queue.
> > - affinities in the thread pool, as in Alexey prototype for
> > perf-record. Doing so would enable reusing the same threadpool for
> > different purposes (evlist open, threaded trace, synthetic threads),
> > avoiding having to spin up threads multiple times.
> > - resizable threadpool, e.g. for lazy spawining of threads.
> >
> > @Arnaldo
> > Since I wanted the workqueue to provide a similar API to the Kernel's
> > workqueue, I followed the naming style I found there, instead of the
> > usual object__method style that is typically found in perf.
> > Let me know if you'd like me to follow perf style instead.
> >
> > Thanks,
> > Riccardo
> >
> > Riccardo Mancini (10):
> > perf workqueue: threadpool creation and destruction
> > perf tests: add test for workqueue
> > perf workqueue: add threadpool start and stop functions
> > perf workqueue: add threadpool execute and wait functions
> > perf workqueue: add sparse annotation header
> > perf workqueue: introduce workqueue struct
> > perf workqueue: implement worker thread and management
> > perf workqueue: add queue_work and flush_workqueue functions
> > perf workqueue: add utility to execute a for loop in parallel
> > perf synthetic-events: use workqueue parallel_for
>
> looks great, would it make sense to put this to libperf?
I don't know about libperf in particular.
The idea is to start using it in perf and, if everything goes well, to put it in
lib/ so that everyone interested in it could just include it.
Since I'm looking for other parts where a workqueue could be useful, if you know
of some in libperf, I could try having a look at them too.
Riccardo
>
> jirka
>
> >
> > tools/perf/tests/Build | 1 +
> > tools/perf/tests/builtin-test.c | 9 +
> > tools/perf/tests/tests.h | 3 +
> > tools/perf/tests/workqueue.c | 453 +++++++++++++++++
> > tools/perf/util/Build | 1 +
> > tools/perf/util/synthetic-events.c | 131 +++--
> > tools/perf/util/workqueue/Build | 2 +
> > tools/perf/util/workqueue/sparse.h | 21 +
> > tools/perf/util/workqueue/threadpool.c | 516 ++++++++++++++++++++
> > tools/perf/util/workqueue/threadpool.h | 29 ++
> > tools/perf/util/workqueue/workqueue.c | 642 +++++++++++++++++++++++++
> > tools/perf/util/workqueue/workqueue.h | 38 ++
> > 12 files changed, 1771 insertions(+), 75 deletions(-)
> > create mode 100644 tools/perf/tests/workqueue.c
> > create mode 100644 tools/perf/util/workqueue/Build
> > create mode 100644 tools/perf/util/workqueue/sparse.h
> > create mode 100644 tools/perf/util/workqueue/threadpool.c
> > create mode 100644 tools/perf/util/workqueue/threadpool.h
> > create mode 100644 tools/perf/util/workqueue/workqueue.c
> > create mode 100644 tools/perf/util/workqueue/workqueue.h
> >
> > --
> > 2.31.1
> >
>