Re: [PATCH v2] Reorder some fields in struct rq.
From: Madadi Vineeth Reddy
Date: Wed Aug 13 2025 - 03:34:30 EST
Hi Blake,
On 31/07/25 02:26, Blake Jones wrote:
> This colocates some hot fields in "struct rq" to be on the same cache line
> as others that are often accessed at the same time or in similar ways.
>
[..snip..]
>
> This patch does not change the size of "struct rq" on machines with 64-byte
> cache lines. The additional "____cacheline_aligned" to put the runqueue
> lock on the next cache line will add an additional 64 bytes of padding on
> machines with 128-byte cache lines; although this is unfortunate, it seemed
> more likely to lead to stably good performance than e.g. by just putting
> the runqueue lock somewhere in the middle of the structure and hoping it
> wasn't on an otherwise busy cache line.
This change introduced an 88 byte hole due to having __lock in a different
cache line on Power11 which is 128 byte architecture which led to one cacheline
more than before.
Tested with your custom test case (thanks for sharing) and observed around
~5% decrease in the number of cycles, along with a slight increase in user
time — both are positive indicators.
Also ran ebizzy, which doesn’t seem to be impacted. I think it would be good
to run a set of standard benchmarks like schbench, ebizzy, hackbench, and
stress-ng, along with a real-life workload, to ensure there’s no negative
impact. I saw that hackbench was tried, but including those numbers would
be helpful.
Reviewed-by: Madadi Vineeth Reddy <vineethr@xxxxxxxxxxxxx>
Tested-by: Madadi Vineeth Reddy <vineethr@xxxxxxxxxxxxx>
Thanks,
Madadi Vineeth Reddy
>
> I ran "hackbench" to test this change, but it didn't show very conclusive
> results. Looking at a profile of the hackbench run, it was spending 95% of
> its cycles inside __alloc_skb(), __kfree_skb(), or kmem_cache_free() -
> almost all of which was spent updating memcg counters or contending on the
> list_lock in kmem_cache_node. In contrast, it spent less than 0.5% of its
> cycles inside either schedule() or try_to_wake_up(). So it's not surprising
> that it didn't show useful results here.
>
[..snip..]
> @@ -1182,8 +1199,6 @@ struct rq {
> struct root_domain *rd;
> struct sched_domain __rcu *sd;
>
> - unsigned long cpu_capacity;
> -
> struct balance_callback *balance_callback;
>
> unsigned char nohz_idle_balance;