Re: [RFC/RFT][PATCH v5] cpuidle: New timer events oriented governor for tickless systems

From: Giovanni Gherdovich
Date: Sat Nov 10 2018 - 14:06:06 EST


On Thu, 2018-11-08 at 18:25 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> Subject: [PATCH] cpuidle: New timer events oriented governor for tickless systems
>
> The venerable menu governor does some thigns that are quite
> questionable in my view.
>
> First, it includes timer wakeups in the pattern detection data and
> mixes them up with wakeups from other sources which in some cases
> causes it to expect what essentially would be a timer wakeup in a
> time frame in which no timer wakeups are possible (becuase it knows
> the time until the next timer event and that is later than the
> expected wakeup time).
>
> Second, it uses the extra exit latency limit based on the predicted
> idle duration and depending on the number of tasks waiting on I/O,
> even though those tasks may run on a different CPU when they are
> woken up.ÂÂMoreover, the time ranges used by it for the sleep length
> correction factors depend on whether or not there are tasks waiting
> on I/O, which again doesn't imply anything in particular, and they
> are not correlated to the list of available idle states in any way
> whatever.
>
> Also, the pattern detection code in menu may end up considering
> values that are too large to matter at all, in which cases running
> it is a waste of time.
>
> A major rework of the menu governor would be required to address
> these issues and the performance of at least some workloads (tuned
> specifically to the current behavior of the menu governor) is likely
> to suffer from that.ÂÂIt is thus better to introduce an entirely new
> governor without them and let everybody use the governor that works
> better with their actual workloads.
>
> The new governor introduced here, the timer events oriented (TEO)
> governor, uses the same basic strategy as menu: it always tries to
> find the deepest idle state that can be used in the given conditions.
> However, it applies a different approach to that problem.
>
> First, it doesn't use "correction factors" for the time till the
> closest timer, but instead it tries to correlate the measured idle
> duration values with the available idle states and use that
> information to pick up the idle state that is most likely to "match"
> the upcoming CPU idle interval.
>
> Second, it doesn't take the number of "I/O waiters" into account at
> all and the pattern detection code in it avoids taking timer wakeups
> into account.ÂÂIt also only uses idle duration values less than the
> current time till the closest timer (with the tick excluded) for that
> purpose.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> ---
>
> v4 -> v5:
> Â* Avoid using shallow idle states when the tick has been stopped already.
>
> v3 -> v4:
> Â* Make the pattern detection avoid returning too early if the minimum
> ÂÂÂsample is too far from the average.
> Â* Reformat the changelog (as requested by Peter).
>
> v2 -> v3:
> Â* Simplify the pattern detection code and make it return a value
> lower than the time to the closest timer if the majority of recent
> idle intervals are below it regardless of their variance (that should
> cause it to be slightly more aggressive).
> Â* Do not count wakeups from state 0 due to the time limit in poll_idle()
> ÂÂÂas non-timer.
>
> [...]

[NOTE: the tables in this message are quite wide, ~130 columns. If this
doesn't get to you properly formatted you can read a copy of this message at
the URL https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html ]


Hello Rafael,

I have results for v3 and v5. Regarding v4, I made a mistake and didn't get
valid data; as I saw v5 coming shortly after, I didn't rerun v4.

I'm replying to the v5 thread because that's where these results belong, but
I'm quoting your text from the v2 email at
https://lore.kernel.org/lkml/4168371.zz0pVZtGOY@xxxxxxxxxxxxxx so that's
easier to follow along.

The quick summary is:

---> sockperf on loopback over UDP, mode "throughput":
ÂÂÂÂÂthis had a 12% regression in v2 on 48x-HASWELL-NUMA, which is completely
ÂÂÂÂÂrecovered in v3 and v5. Good stuff.

---> dbench on xfs:
ÂÂÂÂÂthis was down 16% in v2 on 48x-HASWELL-NUMA. On v5 we're at a 10%
ÂÂÂÂÂregression. Slight improvement. What's really hurting here is the single
ÂÂÂÂÂclient scenario.

---> netperf-udp on loopback:
ÂÂÂÂÂhad 6% regression on v2 on 8x-SKYLAKE-UMA, which is the same as what
ÂÂÂÂÂhappens in v5.

---> tbench on loopback:
ÂÂÂÂÂwas down 10% in v2 on 8x-SKYLAKE-UMA, now slightly worse in v5 with a 12%
ÂÂÂÂÂregression. As in dbench, it's at low number of clients that the results
ÂÂÂÂÂare worst. Note that this machine is different from the one that has the
ÂÂÂÂÂdbench regression.

A more detailed report follows below.

I maintain my original opinion from v2 that this governor is largely
performance-neutral and I'm not overly worried about the numbers above:

* results change from machine to machine: dbench is down 10% on
 48x-HASWELL-NUMA, but it also gives you the largest win on the board with a
 4% improvement on 8x-SKYLAKE-UMA. All regressions I mention only manifest on
 one out of three machines.

* similar benchmarks give contradicting results: dbench seems highly sensitive
 to this patch, but pgbench, sqlite, and fio are not. netperf-udp is slightly
 down on 48x-HASWELL-NUMA but sockperf-udp-throughput has benefited from v5
 on that same machine.

To raise an alert from the performance angle I have to see red on my board
from an entire category of benchmarks (ie I/O, or networking, or
scheduler-intensive, etc) and on a variety of hardware configurations. That's
not happening here.

On Sun, 2018-11-04 at 11:06 +0100, Rafael J. Wysocki wrote:
> On Wednesday, October 31, 2018 7:36:21 PM CET Giovanni Gherdovich wrote:
> >
> > [...]
> > I've tested your patches applying them on v4.18 (plus the backport
> > necessary for v2 as Doug helpfully noted), just because it was the latest
> > release when I started preparing this.

I did the same for v3 and v5: baseline is v4.18, using that backport from
linux-next.

>
> > I've tested it on three machines, with different generations of Intel CPUs:
>
> > * single socket E3-1240 v5 (Skylake 8 cores, which I'll call 8x-SKYLAKE-UMA)
> > * two sockets E5-2698 v4 (Broadwell 80 cores, 80x-BROADWELL-NUMA from here onwards)
> > * two sockets E5-2670 v3 (Haswell 48 cores, 48x-HASWELL-NUMA from here onwards)
> >

Same machines.

>
> > BENCHMARKS WITH NEUTRAL RESULTS
> > ===============================
>
> > These are the workloads where no noticeable difference is measured (on both
> > v1 and v2, all machines), together with the corresponding MMTests[1]
> > configuration file name:
>
> > * pgbench read-only on xfs, pgbench read/write on xfs
> >ÂÂÂÂÂ* global-dhp__db-pgbench-timed-ro-small-xfs
> >ÂÂÂÂÂ* global-dhp__db-pgbench-timed-rw-small-xfs
> > * siege
> >ÂÂÂÂÂ* global-dhp__http-siege
> > * hackbench, pipetest
> >ÂÂÂÂÂ* global-dhp__scheduler-unbound
> > * Linux kernel compilation
> >ÂÂÂÂÂ* global-dhp__workload_kerndevel-xfs
> > * NASA Parallel Benchmarks, C-Class (linear algebra; run both with OpenMP
> >ÂÂÂand OpenMPI, over xfs)
> >ÂÂÂÂÂ* global-dhp__nas-c-class-mpi-full-xfs
> >ÂÂÂÂÂ* global-dhp__nas-c-class-omp-full
> > * FIO (Flexible IO) in several configurations
> >ÂÂÂÂÂ* global-dhp__io-fio-randread-async-randwrite-xfs
> >ÂÂÂÂÂ* global-dhp__io-fio-randread-async-seqwrite-xfs
> >ÂÂÂÂÂ* global-dhp__io-fio-seqread-doublemem-32k-4t-xfs
> >ÂÂÂÂÂ* global-dhp__io-fio-seqread-doublemem-4k-4t-xfs
> > * netperf on loopback over TCP
> >ÂÂÂÂÂ* global-dhp__network-netperf-unbound
>Â
> The above is great to know.

All of the above are confirmed, plus we can add to the group of neutral
benchmarks:

* xfsrepair
ÂÂÂÂ* global-dhp__io-xfsrepair-xfs
* sqlite (insert operations on xfs)
ÂÂÂÂ* global-dhp__db-sqlite-insert-medium-xfs
* schbench
ÂÂÂÂ* global-dhp__workload_schbench
* gitsource on xfs (git unit tests, shell intensive)
ÂÂÂÂ* global-dhp__workload_shellscripts-xfs

>Â
> > BENCHMARKS WITH NON-NEUTRAL RESULTS: OVERVIEW
> > =============================================
>
> > These are benchmarks which exhibit a variation in their performance;
> > you'll see the magnitude of the changes is moderate and it's highly variable
> > from machine to machine. All percentages refer to the v4.18 baseline. In
> > more than one case the Haswell machine seems to prefer v1 to v2.
> >
> > [...]
> >
> > * netperf on loopback over UDP
> >ÂÂÂÂÂ* global-dhp__network-netperf-unbound
>
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ-------------------------------------------------
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂÂÂno changeÂÂÂÂÂÂÂ6% worse
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ80x-BROADWELL-NUMAÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ4% worse
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ48x-HASWELL-NUMAÂÂÂÂÂÂÂÂ3% betterÂÂÂÂÂÂÂ5% worse
> >

New data for netperf-udp, as 8x-SKYLAKE-UMA looked slightly off:

* netperf on loopback over UDP
ÂÂÂÂ* global-dhp__network-netperf-unbound

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2ÂÂÂÂÂÂÂÂÂÂteo-v3ÂÂÂÂÂÂÂÂÂÂteo-v5
 ------------------------------------------------------------------------------
 8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂno changeÂÂÂÂÂÂÂ6% worseÂÂÂÂÂÂÂÂ4% worseÂÂÂÂÂÂÂÂ6% worse
 80x-BROADWELL-NUMAÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ4% worseÂÂÂÂÂÂÂÂno changeÂÂÂÂÂÂÂno change
 48x-HASWELL-NUMAÂÂÂÂÂÂ3% betterÂÂÂÂÂÂÂ5% worseÂÂÂÂÂÂÂÂ7% worseÂÂÂÂÂÂÂÂ5% worse


> > [...]
> >
> > * sockperf on loopback over UDP, mode "throughput"
> >ÂÂÂÂÂ* global-dhp__network-sockperf-unbound
>Â
> Generally speaking, I'm not worried about single-digit percent differences,
> because overall they tend to fall into the noise range in the grand picture.
>Â
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ-------------------------------------------------
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ1% worse
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ80x-BROADWELL-NUMAÂÂÂÂÂÂ3% betterÂÂÂÂÂÂÂ2% better
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ48x-HASWELL-NUMAÂÂÂÂÂÂÂÂ4% betterÂÂÂÂÂÂÂ12% worse
>Â
> But the 12% difference here is slightly worrisome.

Following up on the above:

* sockperf on loopback over UDP, mode "throughput"
ÂÂÂÂ* global-dhp__network-sockperf-unbound

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2ÂÂÂÂÂÂÂÂÂÂteo-v3ÂÂÂÂÂÂÂÂÂÂteo-v5
 -------------------------------------------------------------------------------
 8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ1% worse
 80x-BROADWELL-NUMAÂÂÂÂ3% betterÂÂÂÂÂÂÂ2% betterÂÂÂÂÂÂÂ5% betterÂÂÂÂÂÂÂ3% worse
 48x-HASWELL-NUMAÂÂÂÂÂÂ4% betterÂÂÂÂÂÂÂ12% worseÂÂÂÂÂÂÂno changeÂÂÂÂÂÂÂno change

> >
> > [...]
>
> > * dbench on xfs
> >ÂÂÂÂÂÂÂÂÂ* global-dhp__io-dbench4-async-xfs
>
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ-------------------------------------------------
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂÂÂ3% betterÂÂÂÂÂÂÂ4% better
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ80x-BROADWELL-NUMAÂÂÂÂÂÂno changeÂÂÂÂÂÂÂno change
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ48x-HASWELL-NUMAÂÂÂÂÂÂÂÂ6% worseÂÂÂÂÂÂÂÂ16% worse
>Â
> And same here.

With new data:

* dbench on xfs
ÂÂÂÂ* global-dhp__io-dbench4-async-xfs

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2ÂÂÂÂÂÂÂÂÂÂteo-v3ÂÂÂÂÂÂÂÂÂÂteo-v5ÂÂÂ
 -------------------------------------------------------------------------------
 8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂ3% betterÂÂÂÂÂÂÂ4% betterÂÂÂÂÂÂÂ6% betterÂÂÂÂÂÂÂ4% better
 80x-BROADWELL-NUMAÂÂÂÂno changeÂÂÂÂÂÂÂno changeÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ3% worse
 48x-HASWELL-NUMAÂÂÂÂÂÂ6% worseÂÂÂÂÂÂÂÂ16% worseÂÂÂÂÂÂÂ8% worseÂÂÂÂÂÂÂÂ10% worseÂ


>Â
> > * tbench on loopback
> >ÂÂÂÂÂ* global-dhp__network-tbench
>
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ-------------------------------------------------
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ10% worse
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ80x-BROADWELL-NUMAÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ1% worse
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂ48x-HASWELL-NUMAÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ2% worse
> >

Update on tbench:

* tbench on loopback
ÂÂÂÂ* global-dhp__network-tbench

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂÂÂteo-v2ÂÂÂÂÂÂÂÂÂÂteo-v3ÂÂÂÂÂÂÂÂÂÂteo-v5
 ------------------------------------------------------------------------------
 8x-SKYLAKE-UMAÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ10% worseÂÂÂÂÂÂÂ11% worseÂÂÂÂÂÂÂ12% worse
 80x-BROADWELL-NUMAÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂno cahngeÂÂÂÂÂÂÂ1% worse
 48x-HASWELL-NUMAÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ2% worseÂÂÂÂÂÂÂÂ1% worseÂÂÂÂÂÂÂÂ1% worse

> > [...]
>
> > BENCHMARKS WITH NON-NEUTRAL RESULTS: DETAIL
> > ===========================================
>
> > Now some more detail. Each benchmark is run in a variety of configurations
> > (eg. number of threads, number of concurrent connections and so forth) each
> > of them giving a result. What you see above is the geometric mean of
> > "sub-results"; below is the detailed view where there was a regression
> > larger than 5% (either in v1 or v2, on any of the machines). That means
> > I'll exclude xfsrepar, sqlite, schbench and the git unit tests "gitsource"
> > that have negligible swings from the baseline.
>
> > In all tables asterisks indicate a statement about statistical
> > significance: the difference with baseline has a p-value smaller than 0.1
> > (small p-values indicate that the difference is real and not just random
> > noise).
>
> > NETPERF-UDP
> > ===========
> > NOTES: Test run in mode "stream" over UDP. The varying parameter is the
> >ÂÂÂÂÂmessage size in bytes. Each measurement is taken 5 times and the
> >ÂÂÂÂÂharmonic mean is reported.
> > MEASURES: Throughput in MBits/second, both on the sender and on the receiver end.
> > HIGHER is better
>
> > machine: 8x-SKYLAKE-UMA
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂteo-v2+backport
> > -----------------------------------------------------------------------------------------
> > HmeanÂÂÂÂÂsend-64ÂÂÂÂÂÂÂÂÂ362.27 (ÂÂÂ0.00%)ÂÂÂÂÂÂ362.87 (ÂÂÂ0.16%)ÂÂÂÂÂÂ318.85 * -11.99%*
> > HmeanÂÂÂÂÂsend-128ÂÂÂÂÂÂÂÂ723.17 (ÂÂÂ0.00%)ÂÂÂÂÂÂ723.66 (ÂÂÂ0.07%)ÂÂÂÂÂÂ660.96 *ÂÂ-8.60%*
> > HmeanÂÂÂÂÂsend-256ÂÂÂÂÂÂÂ1435.24 (ÂÂÂ0.00%)ÂÂÂÂÂ1427.08 (ÂÂ-0.57%)ÂÂÂÂÂ1346.22 *ÂÂ-6.20%*
> > HmeanÂÂÂÂÂsend-1024ÂÂÂÂÂÂ5563.78 (ÂÂÂ0.00%)ÂÂÂÂÂ5529.90 *ÂÂ-0.61%*ÂÂÂÂÂ5228.28 *ÂÂ-6.03%*
> > HmeanÂÂÂÂÂsend-2048ÂÂÂÂÂ10935.42 (ÂÂÂ0.00%)ÂÂÂÂ10809.66 *ÂÂ-1.15%*ÂÂÂÂ10521.14 *ÂÂ-3.79%*
> > HmeanÂÂÂÂÂsend-3312ÂÂÂÂÂ16898.66 (ÂÂÂ0.00%)ÂÂÂÂ16539.89 *ÂÂ-2.12%*ÂÂÂÂ16240.87 *ÂÂ-3.89%*
> > HmeanÂÂÂÂÂsend-4096ÂÂÂÂÂ19354.33 (ÂÂÂ0.00%)ÂÂÂÂ19185.43 (ÂÂ-0.87%)ÂÂÂÂ18600.52 *ÂÂ-3.89%*
> > HmeanÂÂÂÂÂsend-8192ÂÂÂÂÂ32238.80 (ÂÂÂ0.00%)ÂÂÂÂ32275.57 (ÂÂÂ0.11%)ÂÂÂÂ29850.62 *ÂÂ-7.41%*
> > HmeanÂÂÂÂÂsend-16384ÂÂÂÂ48146.75 (ÂÂÂ0.00%)ÂÂÂÂ49297.23 *ÂÂÂ2.39%*ÂÂÂÂ48295.51 (ÂÂÂ0.31%)
> > HmeanÂÂÂÂÂrecv-64ÂÂÂÂÂÂÂÂÂ362.16 (ÂÂÂ0.00%)ÂÂÂÂÂÂ362.87 (ÂÂÂ0.19%)ÂÂÂÂÂÂ318.82 * -11.97%*
> > HmeanÂÂÂÂÂrecv-128ÂÂÂÂÂÂÂÂ723.01 (ÂÂÂ0.00%)ÂÂÂÂÂÂ723.66 (ÂÂÂ0.09%)ÂÂÂÂÂÂ660.89 *ÂÂ-8.59%*
> > HmeanÂÂÂÂÂrecv-256ÂÂÂÂÂÂÂ1435.06 (ÂÂÂ0.00%)ÂÂÂÂÂ1426.94 (ÂÂ-0.57%)ÂÂÂÂÂ1346.07 *ÂÂ-6.20%*
> > HmeanÂÂÂÂÂrecv-1024ÂÂÂÂÂÂ5562.68 (ÂÂÂ0.00%)ÂÂÂÂÂ5529.90 *ÂÂ-0.59%*ÂÂÂÂÂ5228.28 *ÂÂ-6.01%*
> > HmeanÂÂÂÂÂrecv-2048ÂÂÂÂÂ10934.36 (ÂÂÂ0.00%)ÂÂÂÂ10809.66 *ÂÂ-1.14%*ÂÂÂÂ10519.89 *ÂÂ-3.79%*
> > HmeanÂÂÂÂÂrecv-3312ÂÂÂÂÂ16898.65 (ÂÂÂ0.00%)ÂÂÂÂ16538.21 *ÂÂ-2.13%*ÂÂÂÂ16240.86 *ÂÂ-3.89%*
> > HmeanÂÂÂÂÂrecv-4096ÂÂÂÂÂ19351.99 (ÂÂÂ0.00%)ÂÂÂÂ19183.17 (ÂÂ-0.87%)ÂÂÂÂ18598.33 *ÂÂ-3.89%*
> > HmeanÂÂÂÂÂrecv-8192ÂÂÂÂÂ32238.74 (ÂÂÂ0.00%)ÂÂÂÂ32275.13 (ÂÂÂ0.11%)ÂÂÂÂ29850.39 *ÂÂ-7.41%*
> > HmeanÂÂÂÂÂrecv-16384ÂÂÂÂ48146.59 (ÂÂÂ0.00%)ÂÂÂÂ49296.23 *ÂÂÂ2.39%*ÂÂÂÂ48295.03 (ÂÂÂ0.31%)
>Â
> That is a bit worse than I would like it to be TBH.

update on netperf-udp:

machine: 8x-SKYLAKE-UMA
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteoÂÂÂÂÂÂÂÂteo-v2+backportÂÂÂÂÂÂÂÂteo-v3+backportÂÂÂÂÂÂÂÂteo-v5+backport
---------------------------------------------------------------------------------------------------------------------------------------
HmeanÂÂÂÂÂsend-64ÂÂÂÂÂÂÂÂÂ362.27 (ÂÂÂ0.00%)ÂÂÂÂÂÂ362.87 (ÂÂÂ0.16%)ÂÂÂÂÂÂ318.85 * -11.99%*ÂÂÂÂÂÂ347.08 *ÂÂ-4.19%*ÂÂÂÂÂÂ333.48 *ÂÂ-7.95%*
HmeanÂÂÂÂÂsend-128ÂÂÂÂÂÂÂÂ723.17 (ÂÂÂ0.00%)ÂÂÂÂÂÂ723.66 (ÂÂÂ0.07%)ÂÂÂÂÂÂ660.96 *ÂÂ-8.60%*ÂÂÂÂÂÂ676.46 *ÂÂ-6.46%*ÂÂÂÂÂÂ650.71 * -10.02%*
HmeanÂÂÂÂÂsend-256ÂÂÂÂÂÂÂ1435.24 (ÂÂÂ0.00%)ÂÂÂÂÂ1427.08 (ÂÂ-0.57%)ÂÂÂÂÂ1346.22 *ÂÂ-6.20%*ÂÂÂÂÂ1359.59 *ÂÂ-5.27%*ÂÂÂÂÂ1323.83 *ÂÂ-7.76%*
HmeanÂÂÂÂÂsend-1024ÂÂÂÂÂÂ5563.78 (ÂÂÂ0.00%)ÂÂÂÂÂ5529.90 *ÂÂ-0.61%*ÂÂÂÂÂ5228.28 *ÂÂ-6.03%*ÂÂÂÂÂ5382.04 *ÂÂ-3.27%*ÂÂÂÂÂ5271.99 *ÂÂ-5.24%*
HmeanÂÂÂÂÂsend-2048ÂÂÂÂÂ10935.42 (ÂÂÂ0.00%)ÂÂÂÂ10809.66 *ÂÂ-1.15%*ÂÂÂÂ10521.14 *ÂÂ-3.79%*ÂÂÂÂ10610.29 *ÂÂ-2.97%*ÂÂÂÂ10544.58 *ÂÂ-3.57%*
HmeanÂÂÂÂÂsend-3312ÂÂÂÂÂ16898.66 (ÂÂÂ0.00%)ÂÂÂÂ16539.89 *ÂÂ-2.12%*ÂÂÂÂ16240.87 *ÂÂ-3.89%*ÂÂÂÂ16271.23 *ÂÂ-3.71%*ÂÂÂÂ15968.89 *ÂÂ-5.50%*
HmeanÂÂÂÂÂsend-4096ÂÂÂÂÂ19354.33 (ÂÂÂ0.00%)ÂÂÂÂ19185.43 (ÂÂ-0.87%)ÂÂÂÂ18600.52 *ÂÂ-3.89%*ÂÂÂÂ18692.16 *ÂÂ-3.42%*ÂÂÂÂ18408.69 *ÂÂ-4.89%*
HmeanÂÂÂÂÂsend-8192ÂÂÂÂÂ32238.80 (ÂÂÂ0.00%)ÂÂÂÂ32275.57 (ÂÂÂ0.11%)ÂÂÂÂ29850.62 *ÂÂ-7.41%*ÂÂÂÂ30066.83 *ÂÂ-6.74%*ÂÂÂÂ29824.62 *ÂÂ-7.49%*
HmeanÂÂÂÂÂsend-16384ÂÂÂÂ48146.75 (ÂÂÂ0.00%)ÂÂÂÂ49297.23 *ÂÂÂ2.39%*ÂÂÂÂ48295.51 (ÂÂÂ0.31%)ÂÂÂÂ48800.37 *ÂÂÂ1.36%*ÂÂÂÂ48247.73 (ÂÂÂ0.21%)
HmeanÂÂÂÂÂrecv-64ÂÂÂÂÂÂÂÂÂ362.16 (ÂÂÂ0.00%)ÂÂÂÂÂÂ362.87 (ÂÂÂ0.19%)ÂÂÂÂÂÂ318.82 * -11.97%*ÂÂÂÂÂÂ347.07 *ÂÂ-4.17%*ÂÂÂÂÂÂ333.48 *ÂÂ-7.92%*
HmeanÂÂÂÂÂrecv-128ÂÂÂÂÂÂÂÂ723.01 (ÂÂÂ0.00%)ÂÂÂÂÂÂ723.66 (ÂÂÂ0.09%)ÂÂÂÂÂÂ660.89 *ÂÂ-8.59%*ÂÂÂÂÂÂ676.39 *ÂÂ-6.45%*ÂÂÂÂÂÂ650.63 * -10.01%*
HmeanÂÂÂÂÂrecv-256ÂÂÂÂÂÂÂ1435.06 (ÂÂÂ0.00%)ÂÂÂÂÂ1426.94 (ÂÂ-0.57%)ÂÂÂÂÂ1346.07 *ÂÂ-6.20%*ÂÂÂÂÂ1359.45 *ÂÂ-5.27%*ÂÂÂÂÂ1323.81 *ÂÂ-7.75%*
HmeanÂÂÂÂÂrecv-1024ÂÂÂÂÂÂ5562.68 (ÂÂÂ0.00%)ÂÂÂÂÂ5529.90 *ÂÂ-0.59%*ÂÂÂÂÂ5228.28 *ÂÂ-6.01%*ÂÂÂÂÂ5381.37 *ÂÂ-3.26%*ÂÂÂÂÂ5271.45 *ÂÂ-5.24%*
HmeanÂÂÂÂÂrecv-2048ÂÂÂÂÂ10934.36 (ÂÂÂ0.00%)ÂÂÂÂ10809.66 *ÂÂ-1.14%*ÂÂÂÂ10519.89 *ÂÂ-3.79%*ÂÂÂÂ10610.28 *ÂÂ-2.96%*ÂÂÂÂ10544.58 *ÂÂ-3.56%*
HmeanÂÂÂÂÂrecv-3312ÂÂÂÂÂ16898.65 (ÂÂÂ0.00%)ÂÂÂÂ16538.21 *ÂÂ-2.13%*ÂÂÂÂ16240.86 *ÂÂ-3.89%*ÂÂÂÂ16269.34 *ÂÂ-3.72%*ÂÂÂÂ15967.13 *ÂÂ-5.51%*
HmeanÂÂÂÂÂrecv-4096ÂÂÂÂÂ19351.99 (ÂÂÂ0.00%)ÂÂÂÂ19183.17 (ÂÂ-0.87%)ÂÂÂÂ18598.33 *ÂÂ-3.89%*ÂÂÂÂ18690.13 *ÂÂ-3.42%*ÂÂÂÂ18407.45 *ÂÂ-4.88%*
HmeanÂÂÂÂÂrecv-8192ÂÂÂÂÂ32238.74 (ÂÂÂ0.00%)ÂÂÂÂ32275.13 (ÂÂÂ0.11%)ÂÂÂÂ29850.39 *ÂÂ-7.41%*ÂÂÂÂ30062.78 *ÂÂ-6.75%*ÂÂÂÂ29824.30 *ÂÂ-7.49%*
HmeanÂÂÂÂÂrecv-16384ÂÂÂÂ48146.59 (ÂÂÂ0.00%)ÂÂÂÂ49296.23 *ÂÂÂ2.39%*ÂÂÂÂ48295.03 (ÂÂÂ0.31%)ÂÂÂÂ48786.88 *ÂÂÂ1.33%*ÂÂÂÂ48246.71 (ÂÂÂ0.21%)

Here is a plot of the raw benchmark data, you can better see the distribution
and variability of the results:
https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#netperf-udp

> > [...]
> >
> > SOCKPERF-UDP-THROUGHPUT
> > =======================
> > NOTES: Test run in mode "throughput" over UDP. The varying parameter is the
> >ÂÂÂÂÂmessage size.
> > MEASURES: Throughput, in MBits/second
> > HIGHER is better
>
> > machine: 48x-HASWELL-NUMA
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂteo-v2+backport
> > ----------------------------------------------------------------------------------
> > HmeanÂÂÂÂÂ14ÂÂÂÂÂÂÂÂ48.16 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ50.94 *ÂÂÂ5.77%*ÂÂÂÂÂÂÂ42.50 * -11.77%*
> > HmeanÂÂÂÂÂ100ÂÂÂÂÂÂ346.77 (ÂÂÂ0.00%)ÂÂÂÂÂÂ358.74 *ÂÂÂ3.45%*ÂÂÂÂÂÂ303.31 * -12.53%*
> > HmeanÂÂÂÂÂ300ÂÂÂÂÂ1018.06 (ÂÂÂ0.00%)ÂÂÂÂÂ1053.75 *ÂÂÂ3.51%*ÂÂÂÂÂÂ895.55 * -12.03%*
> > HmeanÂÂÂÂÂ500ÂÂÂÂÂ1693.07 (ÂÂÂ0.00%)ÂÂÂÂÂ1754.62 *ÂÂÂ3.64%*ÂÂÂÂÂ1489.61 * -12.02%*
> > HmeanÂÂÂÂÂ850ÂÂÂÂÂ2853.04 (ÂÂÂ0.00%)ÂÂÂÂÂ2948.73 *ÂÂÂ3.35%*ÂÂÂÂÂ2473.50 * -13.30%*
>Â
> Well, in this case the consistent improvement in v1 turned into a consistent decline
> in the v2, and over 10% for that matter.ÂÂNeeds improvement IMO.

Update: this one got resolved in v5,

machine: 48x-HASWELL-NUMA
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteoÂÂÂÂÂÂÂÂteo-v2+backportÂÂÂÂÂÂÂÂteo-v3+backportÂÂÂÂÂÂÂÂteo-v5+backport
--------------------------------------------------------------------------------------------------------------------------------
HmeanÂÂÂÂÂ14ÂÂÂÂÂÂÂÂ48.16 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ50.94 *ÂÂÂ5.77%*ÂÂÂÂÂÂÂ42.50 * -11.77%*ÂÂÂÂÂÂÂ48.91 *ÂÂÂ1.55%*ÂÂÂÂÂÂÂ49.06 *ÂÂÂ1.87%*
HmeanÂÂÂÂÂ100ÂÂÂÂÂÂ346.77 (ÂÂÂ0.00%)ÂÂÂÂÂÂ358.74 *ÂÂÂ3.45%*ÂÂÂÂÂÂ303.31 * -12.53%*ÂÂÂÂÂÂ350.75 (ÂÂÂ1.15%)ÂÂÂÂÂÂ347.52 (ÂÂÂ0.22%)
HmeanÂÂÂÂÂ300ÂÂÂÂÂ1018.06 (ÂÂÂ0.00%)ÂÂÂÂÂ1053.75 *ÂÂÂ3.51%*ÂÂÂÂÂÂ895.55 * -12.03%*ÂÂÂÂÂ1014.00 (ÂÂ-0.40%)ÂÂÂÂÂ1023.99 (ÂÂÂ0.58%)
HmeanÂÂÂÂÂ500ÂÂÂÂÂ1693.07 (ÂÂÂ0.00%)ÂÂÂÂÂ1754.62 *ÂÂÂ3.64%*ÂÂÂÂÂ1489.61 * -12.02%*ÂÂÂÂÂ1688.50 (ÂÂ-0.27%)ÂÂÂÂÂ1698.43 (ÂÂÂ0.32%)
HmeanÂÂÂÂÂ850ÂÂÂÂÂ2853.04 (ÂÂÂ0.00%)ÂÂÂÂÂ2948.73 *ÂÂÂ3.35%*ÂÂÂÂÂ2473.50 * -13.30%*ÂÂÂÂÂ2836.13 (ÂÂ-0.59%)ÂÂÂÂÂ2767.66 *ÂÂ-2.99%*

plots of raw data at
https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#sockperf-udp-throughput

>Â
> > DBENCH4
> > =======
> > NOTES: asyncronous IO; varies the number of clients up to NUMCPUS*8.
> > MEASURES: latency (millisecs)
> > LOWER is better
>
> > machine: 48x-HASWELL-NUMA
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteo-v1ÂÂÂÂÂÂÂÂteo-v2+backport
> > ----------------------------------------------------------------------------------
> > AmeanÂÂÂÂÂÂ1ÂÂÂÂÂÂÂÂ37.15 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ50.10 ( -34.86%)ÂÂÂÂÂÂÂ39.02 (ÂÂ-5.03%)
> > AmeanÂÂÂÂÂÂ2ÂÂÂÂÂÂÂÂ43.75 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ45.50 (ÂÂ-4.01%)ÂÂÂÂÂÂÂ44.36 (ÂÂ-1.39%)
> > AmeanÂÂÂÂÂÂ4ÂÂÂÂÂÂÂÂ54.42 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ58.85 (ÂÂ-8.15%)ÂÂÂÂÂÂÂ58.17 (ÂÂ-6.89%)
> > AmeanÂÂÂÂÂÂ8ÂÂÂÂÂÂÂÂ75.72 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ74.25 (ÂÂÂ1.94%)ÂÂÂÂÂÂÂ82.76 (ÂÂ-9.30%)
> > AmeanÂÂÂÂÂÂ16ÂÂÂÂÂÂ116.56 (ÂÂÂ0.00%)ÂÂÂÂÂÂ119.88 (ÂÂ-2.85%)ÂÂÂÂÂÂ164.14 ( -40.82%)
> > AmeanÂÂÂÂÂÂ32ÂÂÂÂÂÂ570.02 (ÂÂÂ0.00%)ÂÂÂÂÂÂ561.92 (ÂÂÂ1.42%)ÂÂÂÂÂÂ681.94 ( -19.63%)
> > AmeanÂÂÂÂÂÂ64ÂÂÂÂÂ3185.20 (ÂÂÂ0.00%)ÂÂÂÂÂ3291.80 (ÂÂ-3.35%)ÂÂÂÂÂ4337.43 ( -36.17%)
>Â
> This one too.

Update:

machine: 48x-HASWELL-NUMA
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteoÂÂÂÂÂÂÂÂteo-v2+backportÂÂÂÂÂÂÂÂteo-v3+backportÂÂÂÂÂÂÂÂteo-v5+backport
--------------------------------------------------------------------------------------------------------------------------------
AmeanÂÂÂÂÂÂ1ÂÂÂÂÂÂÂÂ37.15 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ50.10 ( -34.86%)ÂÂÂÂÂÂÂ39.02 (ÂÂ-5.03%)ÂÂÂÂÂÂÂ52.24 ( -40.63%)ÂÂÂÂÂÂÂ51.62 ( -38.96%)
AmeanÂÂÂÂÂÂ2ÂÂÂÂÂÂÂÂ43.75 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ45.50 (ÂÂ-4.01%)ÂÂÂÂÂÂÂ44.36 (ÂÂ-1.39%)ÂÂÂÂÂÂÂ47.25 (ÂÂ-8.00%)ÂÂÂÂÂÂÂ44.20 (ÂÂ-1.03%)
AmeanÂÂÂÂÂÂ4ÂÂÂÂÂÂÂÂ54.42 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ58.85 (ÂÂ-8.15%)ÂÂÂÂÂÂÂ58.17 (ÂÂ-6.89%)ÂÂÂÂÂÂÂ55.12 (ÂÂ-1.29%)ÂÂÂÂÂÂÂ58.07 (ÂÂ-6.70%)
AmeanÂÂÂÂÂÂ8ÂÂÂÂÂÂÂÂ75.72 (ÂÂÂ0.00%)ÂÂÂÂÂÂÂ74.25 (ÂÂÂ1.94%)ÂÂÂÂÂÂÂ82.76 (ÂÂ-9.30%)ÂÂÂÂÂÂÂ78.63 (ÂÂ-3.84%)ÂÂÂÂÂÂÂ85.33 ( -12.68%)
AmeanÂÂÂÂÂÂ16ÂÂÂÂÂÂ116.56 (ÂÂÂ0.00%)ÂÂÂÂÂÂ119.88 (ÂÂ-2.85%)ÂÂÂÂÂÂ164.14 ( -40.82%)ÂÂÂÂÂÂ124.87 (ÂÂ-7.13%)ÂÂÂÂÂÂ124.54 (ÂÂ-6.85%)
AmeanÂÂÂÂÂÂ32ÂÂÂÂÂÂ570.02 (ÂÂÂ0.00%)ÂÂÂÂÂÂ561.92 (ÂÂÂ1.42%)ÂÂÂÂÂÂ681.94 ( -19.63%)ÂÂÂÂÂÂ568.93 (ÂÂÂ0.19%)ÂÂÂÂÂÂ571.23 (ÂÂ-0.21%)
AmeanÂÂÂÂÂÂ64ÂÂÂÂÂ3185.20 (ÂÂÂ0.00%)ÂÂÂÂÂ3291.80 (ÂÂ-3.35%)ÂÂÂÂÂ4337.43 ( -36.17%)ÂÂÂÂÂ3181.13 (ÂÂÂ0.13%)ÂÂÂÂÂ3382.48 (ÂÂ-6.19%)

It doesn't do well in the single-client scenario; v2 was a lot better at
that, but on the other side it suffered at saturation (64 clients on a 48
cores box). Plot at

https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#dbench4

>Â
> > TBENCH4
> > =======
> > NOTES: networking counterpart of dbench. Varies the number of clients up to NUMCPUS*4
> > MEASURES: Throughput, MB/sec
> > HIGHER is better
>
> > machine: 8x-SKYLAKE-UMA
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
> >ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteoÂÂÂÂÂÂÂÂteo-v2+backport
> > ----------------------------------------------------------------------------------------
> > HmeanÂÂÂÂÂmb/sec-1ÂÂÂÂÂÂÂ620.52 (ÂÂÂ0.00%)ÂÂÂÂÂÂ613.98 *ÂÂ-1.05%*ÂÂÂÂÂÂ502.47 * -19.03%*
> > HmeanÂÂÂÂÂmb/sec-2ÂÂÂÂÂÂ1179.05 (ÂÂÂ0.00%)ÂÂÂÂÂ1112.84 *ÂÂ-5.62%*ÂÂÂÂÂÂ820.57 * -30.40%*
> > HmeanÂÂÂÂÂmb/sec-4ÂÂÂÂÂÂ2072.29 (ÂÂÂ0.00%)ÂÂÂÂÂ2040.55 *ÂÂ-1.53%*ÂÂÂÂÂ2036.11 *ÂÂ-1.75%*
> > HmeanÂÂÂÂÂmb/sec-8ÂÂÂÂÂÂ4238.96 (ÂÂÂ0.00%)ÂÂÂÂÂ4205.01 *ÂÂ-0.80%*ÂÂÂÂÂ4124.59 *ÂÂ-2.70%*
> > HmeanÂÂÂÂÂmb/sec-16ÂÂÂÂÂ3515.96 (ÂÂÂ0.00%)ÂÂÂÂÂ3536.23 *ÂÂÂ0.58%*ÂÂÂÂÂ3500.02 *ÂÂ-0.45%*
> > HmeanÂÂÂÂÂmb/sec-32ÂÂÂÂÂ3452.92 (ÂÂÂ0.00%)ÂÂÂÂÂ3448.94 *ÂÂ-0.12%*ÂÂÂÂÂ3428.08 *ÂÂ-0.72%*
>
>Â
> And same here.

New data:

machine: 8x-SKYLAKE-UMA
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ4.18.0
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvanillaÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂteoÂÂÂÂÂÂÂÂteo-v2+backportÂÂÂÂÂÂÂÂteo-v3+backportÂÂÂÂÂÂÂÂteo-v5+backport
--------------------------------------------------------------------------------------------------------------------------------------
HmeanÂÂÂÂÂmb/sec-1ÂÂÂÂÂÂÂ620.52 (ÂÂÂ0.00%)ÂÂÂÂÂÂ613.98 *ÂÂ-1.05%*ÂÂÂÂÂÂ502.47 * -19.03%*ÂÂÂÂÂÂ492.77 * -20.59%*ÂÂÂÂÂÂ464.52 * -25.14%*
HmeanÂÂÂÂÂmb/sec-2ÂÂÂÂÂÂ1179.05 (ÂÂÂ0.00%)ÂÂÂÂÂ1112.84 *ÂÂ-5.62%*ÂÂÂÂÂÂ820.57 * -30.40%*ÂÂÂÂÂÂ831.23 * -29.50%*ÂÂÂÂÂÂ780.97 * -33.76%*
HmeanÂÂÂÂÂmb/sec-4ÂÂÂÂÂÂ2072.29 (ÂÂÂ0.00%)ÂÂÂÂÂ2040.55 *ÂÂ-1.53%*ÂÂÂÂÂ2036.11 *ÂÂ-1.75%*ÂÂÂÂÂ2016.97 *ÂÂ-2.67%*ÂÂÂÂÂ2019.79 *ÂÂ-2.53%*
HmeanÂÂÂÂÂmb/sec-8ÂÂÂÂÂÂ4238.96 (ÂÂÂ0.00%)ÂÂÂÂÂ4205.01 *ÂÂ-0.80%*ÂÂÂÂÂ4124.59 *ÂÂ-2.70%*ÂÂÂÂÂ4098.06 *ÂÂ-3.32%*ÂÂÂÂÂ4171.64 *ÂÂ-1.59%*
HmeanÂÂÂÂÂmb/sec-16ÂÂÂÂÂ3515.96 (ÂÂÂ0.00%)ÂÂÂÂÂ3536.23 *ÂÂÂ0.58%*ÂÂÂÂÂ3500.02 *ÂÂ-0.45%*ÂÂÂÂÂ3438.60 *ÂÂ-2.20%*ÂÂÂÂÂ3456.89 *ÂÂ-1.68%*
HmeanÂÂÂÂÂmb/sec-32ÂÂÂÂÂ3452.92 (ÂÂÂ0.00%)ÂÂÂÂÂ3448.94 *ÂÂ-0.12%*ÂÂÂÂÂ3428.08 *ÂÂ-0.72%*ÂÂÂÂÂ3369.30 *ÂÂ-2.42%*ÂÂÂÂÂ3430.09 *ÂÂ-0.66%*

Again, the pain point is at low client count; v1 OTOH was neutral. Plot at
https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#tbench4


Cheers,
Giovanni