Re: [PATCH] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs

From: John Garry
Date: Tue Mar 10 2020 - 07:33:25 EST


On 20/01/2020 19:24, Thomas Gleixner wrote:
Marc,

Marc Zyngier <maz@xxxxxxxxxx> writes:
We're stuck between a rock and a hard place here:

(1) We place all interrupts on the least loaded CPU that matches
the affinity -> results in performance issues on some funky
HW (like D05's SAS controller).

(2) We place managed interrupts on the least loaded CPU that matches
the affinity -> we have artificial load on NUMA boundaries, and
reduced spread of overlapping managed interrupts.

(3) We don't account for non-managed LPIs, and we run the risk of
unpredictable performance because we don't really know where
the *other* interrupts are.

My personal preference would be to go for (1), as in my original post.
I find (3) the least appealing, because we don't track things anymore.
(2) feels like "the least of all evils", as it is a decent performance
gain, seems to give predictable performance, and doesn't regress lesser
systems...

I'm definitely open to suggestions here.

The way x86 does it and that's mostly ok except for some really broken
setups is:

1) Non-managed interrupts:

If the interrupt is bound to a node, then we try to find a target

I) in the intersection of affinity mask and node mask.

II) in the nodemask itself

Yes we ignore affinity mask there because that's pretty much
the same as if the given affinity does not contain an online
CPU.

If all of that fails then we try the nodeless mode

If the interrupt is not bound to a node, then we try to find a target

I) in the intersection of affinity mask and online mask.

II) in the onlinemask itself

Each step searches for the CPU in the searched mask which has the
least number of total interrupts assigned.

2) Managed interrupts

For managed interrupts we just search in the intersection of assigned
mask and online CPUs for the CPU with the least number of managed
interrupts.

If no CPU is online then the interrupt is shutdown anyway, so no
fallback required.

Don't know whether that's something you can map to ARM64, but I assume
the principle of trying to enforce NUMA locality plus balancing the
number of interrupts makes sense in general.


Hi Marc,

I was wondering if there is anything we can do to progress this patch?

Apart from being a good change in itself, I need to do some SMMU testing for nextgen product development and I would like to include this patch, most preferably from mainline.

Cheers,
John