Re: [PATCH] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs

From: Thomas Gleixner
Date: Mon Jan 20 2020 - 14:24:50 EST


Marc,

Marc Zyngier <maz@xxxxxxxxxx> writes:
> We're stuck between a rock and a hard place here:
>
> (1) We place all interrupts on the least loaded CPU that matches
> the affinity -> results in performance issues on some funky
> HW (like D05's SAS controller).
>
> (2) We place managed interrupts on the least loaded CPU that matches
> the affinity -> we have artificial load on NUMA boundaries, and
> reduced spread of overlapping managed interrupts.
>
> (3) We don't account for non-managed LPIs, and we run the risk of
> unpredictable performance because we don't really know where
> the *other* interrupts are.
>
> My personal preference would be to go for (1), as in my original post.
> I find (3) the least appealing, because we don't track things anymore.
> (2) feels like "the least of all evils", as it is a decent performance
> gain, seems to give predictable performance, and doesn't regress lesser
> systems...
>
> I'm definitely open to suggestions here.

The way x86 does it and that's mostly ok except for some really broken
setups is:

1) Non-managed interrupts:

If the interrupt is bound to a node, then we try to find a target

I) in the intersection of affinity mask and node mask.

II) in the nodemask itself

Yes we ignore affinity mask there because that's pretty much
the same as if the given affinity does not contain an online
CPU.

If all of that fails then we try the nodeless mode

If the interrupt is not bound to a node, then we try to find a target

I) in the intersection of affinity mask and online mask.

II) in the onlinemask itself

Each step searches for the CPU in the searched mask which has the
least number of total interrupts assigned.

2) Managed interrupts

For managed interrupts we just search in the intersection of assigned
mask and online CPUs for the CPU with the least number of managed
interrupts.

If no CPU is online then the interrupt is shutdown anyway, so no
fallback required.

Don't know whether that's something you can map to ARM64, but I assume
the principle of trying to enforce NUMA locality plus balancing the
number of interrupts makes sense in general.

Thanks,

tglx