Re: [RFC patch v3 00/20] Cache aware scheduling

From: Chen, Yu C
Date: Thu Jun 19 2025 - 09:23:23 EST

Next message: Paolo Abeni: "Re: [PATCH net-next v11 13/14] dpll: zl3073x: Add support to get/set frequency on input pins"
Previous message: Frank Wunderlich: "[net-next v6 4/4] net: ethernet: mtk_eth_soc: only use legacy mode on missing IRQ name"
In reply to: Yangyu Chen: "Re: [RFC patch v3 00/20] Cache aware scheduling"
Next in thread: Yangyu Chen: "Re: [RFC patch v3 00/20] Cache aware scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/19/2025 2:39 PM, Yangyu Chen wrote:

Nice work!

I've tested your patch based on commit fb4d33ab452e and found it
incredibly helpful for Verilator with large RTL simulations like
XiangShan [1] on AMD EPYC Geona.

I've created a simple benchmark [2] using a static build of an
8-thread Verilator of XiangShan. Simply clone the repository and
run `make run`.

In a static allocated 8-CCX KVM (with a total of 128 vCPUs) on EPYC
9T24, before the patch, we have a simulation time of 49.348ms. This
was because each thread was distributed across every CCX, resulting
in extremely high core-to-core latency. However, after applying the
patch, the entire 8-thread Verilator is allocated to a single CCX.
Consequently, the simulation time was reduced to 24.196ms, which
is a remarkable 2.03x faster than before. We don't need numactl
anymore!

[1] https://github.com/OpenXiangShan/XiangShan
[2] https://github.com/cyyself/chacha20-xiangshan

Tested-by: Yangyu Chen <cyy@xxxxxxxxxxxx>

Thanks Yangyu for your test. May I know if these 8-threads have any
data sharing with each other, or each thread has their dedicated
data? Or, there is 1 main thread, the other 7 threads do the
chacha20 rotate and put the result to the main thread?
Anyway I tested it on a Xeon EMR with turbo-disabled and saw ~20%
reduction in the total time.

Thanks,
Chenyu

Next message: Paolo Abeni: "Re: [PATCH net-next v11 13/14] dpll: zl3073x: Add support to get/set frequency on input pins"
Previous message: Frank Wunderlich: "[net-next v6 4/4] net: ethernet: mtk_eth_soc: only use legacy mode on missing IRQ name"
In reply to: Yangyu Chen: "Re: [RFC patch v3 00/20] Cache aware scheduling"
Next in thread: Yangyu Chen: "Re: [RFC patch v3 00/20] Cache aware scheduling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]