Re: [PATCH v3 2/3] genirq: Always limit the affinity to online CPUs

From: Marek Szyprowski
Date: Thu Apr 14 2022 - 05:09:46 EST


Hi Marc,

On 13.04.2022 19:26, Marc Zyngier wrote:
> Hi Marek,
>
> On Wed, 13 Apr 2022 15:59:21 +0100,
> Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>> Hi Marc,
>>
>> On 05.04.2022 20:50, Marc Zyngier wrote:
>>> When booting with maxcpus=<small number> (or even loading a driver
>>> while most CPUs are offline), it is pretty easy to observe managed
>>> affinities containing a mix of online and offline CPUs being passed
>>> to the irqchip driver.
>>>
>>> This means that the irqchip cannot trust the affinity passed down
>>> from the core code, which is a bit annoying and requires (at least
>>> in theory) all drivers to implement some sort of affinity narrowing.
>>>
>>> In order to address this, always limit the cpumask to the set of
>>> online CPUs.
>>>
>>> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
>> This patch landed in linux next-20220413 as commit 33de0aa4bae9
>> ("genirq: Always limit the affinity to online CPUs"). Unfortunately it
>> breaks booting of most ARM 32bit Samsung Exynos based boards.
>>
>> I don't see anything specific in the log, though. Booting just hangs at
>> some point. The only Samsung Exynos boards that boot properly are those
>> Exynos4412 based.
>>
>> I assume that this is related to the Multi Core Timer IRQ configuration
>> specific for that SoCs. Exynos4412 uses PPI interrupts, while all other
>> Exynos SoCs have separate IRQ lines for each CPU.
>>
>> Let me know how I can help debugging this issue.
> Thanks for the heads up. Can you pick the last working kernel, enable
> CONFIG_GENERIC_IRQ_DEBUGFS, and dump the /sys/kernel/debug/irq/irqs/
> entries for the timer IRQs?

Exynos4210, Trats board, next-20220411:

root@target:~# cat /proc/interrupts | grep mct
 40:          0          0 GIC-0  89 Level     mct_comp_irq
 41:       4337          0 GIC-0  74 Level     mct_tick0
 42:          0      11061 GIC-0  80 Level     mct_tick1
root@target:~# cat /sys/kernel/debug/irq/irqs/41
handler:  handle_fasteoi_irq
device:   (null)
status:   0x00003504
            _IRQ_NOPROBE
            _IRQ_NOAUTOEN
istate:   0x00000000
ddepth:   0
wdepth:   0
dstate:   0x13403604
            IRQ_TYPE_LEVEL_HIGH
            IRQD_LEVEL
            IRQD_ACTIVATED
            IRQD_IRQ_STARTED
            IRQD_NO_BALANCING
            IRQD_SINGLE_TARGET
            IRQD_AFFINITY_SET
            IRQD_DEFAULT_TRIGGER_SET
            IRQD_HANDLE_ENFORCE_IRQCTX
node:     0
affinity: 0
effectiv: 0
domain:  :soc:interrupt-controller@10490000
 hwirq:   0x4a
 chip:    GIC-0
  flags:   0x15
             IRQCHIP_SET_TYPE_MASKED
             IRQCHIP_MASK_ON_SUSPEND
             IRQCHIP_SKIP_SET_WAKE
root@target:~# cat /sys/kernel/debug/irq/irqs/42
handler:  handle_fasteoi_irq
device:   (null)
status:   0x00003504
            _IRQ_NOPROBE
            _IRQ_NOAUTOEN
istate:   0x00000000
ddepth:   0
wdepth:   0
dstate:   0x13403604
            IRQ_TYPE_LEVEL_HIGH
            IRQD_LEVEL
            IRQD_ACTIVATED
            IRQD_IRQ_STARTED
            IRQD_NO_BALANCING
            IRQD_SINGLE_TARGET
            IRQD_AFFINITY_SET
            IRQD_DEFAULT_TRIGGER_SET
            IRQD_HANDLE_ENFORCE_IRQCTX
node:     0
affinity: 1
effectiv: 1
domain:  :soc:interrupt-controller@10490000
 hwirq:   0x50
 chip:    GIC-0
  flags:   0x15
             IRQCHIP_SET_TYPE_MASKED
             IRQCHIP_MASK_ON_SUSPEND
             IRQCHIP_SKIP_SET_WAKE
root@target:~#

> Also, see below.
>
>>> ---
>>> kernel/irq/manage.c | 25 +++++++++++++++++--------
>>> 1 file changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
>>> index c03f71d5ec10..f71ecc100545 100644
>>> --- a/kernel/irq/manage.c
>>> +++ b/kernel/irq/manage.c
>>> @@ -222,11 +222,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>>> {
>>> struct irq_desc *desc = irq_data_to_desc(data);
>>> struct irq_chip *chip = irq_data_get_irq_chip(data);
>>> + const struct cpumask *prog_mask;
>>> int ret;
>>>
>>> + static DEFINE_RAW_SPINLOCK(tmp_mask_lock);
>>> + static struct cpumask tmp_mask;
>>> +
>>> if (!chip || !chip->irq_set_affinity)
>>> return -EINVAL;
>>>
>>> + raw_spin_lock(&tmp_mask_lock);
>>> /*
>>> * If this is a managed interrupt and housekeeping is enabled on
>>> * it check whether the requested affinity mask intersects with
>>> @@ -248,24 +253,28 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
>>> */
>>> if (irqd_affinity_is_managed(data) &&
>>> housekeeping_enabled(HK_TYPE_MANAGED_IRQ)) {
>>> - const struct cpumask *hk_mask, *prog_mask;
>>> -
>>> - static DEFINE_RAW_SPINLOCK(tmp_mask_lock);
>>> - static struct cpumask tmp_mask;
>>> + const struct cpumask *hk_mask;
>>>
>>> hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
>>>
>>> - raw_spin_lock(&tmp_mask_lock);
>>> cpumask_and(&tmp_mask, mask, hk_mask);
>>> if (!cpumask_intersects(&tmp_mask, cpu_online_mask))
>>> prog_mask = mask;
>>> else
>>> prog_mask = &tmp_mask;
>>> - ret = chip->irq_set_affinity(data, prog_mask, force);
>>> - raw_spin_unlock(&tmp_mask_lock);
>>> } else {
>>> - ret = chip->irq_set_affinity(data, mask, force);
>>> + prog_mask = mask;
>>> }
>>> +
>>> + /* Make sure we only provide online CPUs to the irqchip */
>>> + cpumask_and(&tmp_mask, prog_mask, cpu_online_mask);
>>> + if (!cpumask_empty(&tmp_mask))
>>> + ret = chip->irq_set_affinity(data, &tmp_mask, force);
>>> + else
>>> + ret = -EINVAL;
> Can you also check that with the patch applied, it is this path that
> is taken and that it is the timer interrupts that get rejected? If
> that's the case, can you put a dump_stack() here and give me that
> stack trace? The use of irq_force_affinity() in the driver looks
> suspicious...

[    0.158241] smp: Bringing up secondary CPUs ...
[    0.166118] irq_do_set_affinity irq 42
[    0.166160] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
5.18.0-rc1-00002-g33de0aa4bae9-dirty #11708
[    0.166176] Hardware name: Samsung Exynos (Flattened Device Tree)
[    0.166188]  unwind_backtrace from show_stack+0x10/0x14
[    0.166220]  show_stack from dump_stack_lvl+0x58/0x70
[    0.166239]  dump_stack_lvl from irq_do_set_affinity+0x188/0x1c8
[    0.166258]  irq_do_set_affinity from irq_set_affinity_locked+0xf8/0x17c
[    0.166274]  irq_set_affinity_locked from irq_force_affinity+0x34/0x54
[    0.166290]  irq_force_affinity from exynos4_mct_starting_cpu+0xdc/0x11c
[    0.166308]  exynos4_mct_starting_cpu from
cpuhp_invoke_callback+0x190/0x38c
[    0.166328]  cpuhp_invoke_callback from
cpuhp_invoke_callback_range+0x98/0xb4
[    0.166345]  cpuhp_invoke_callback_range from
notify_cpu_starting+0x54/0x94
[    0.166364]  notify_cpu_starting from secondary_start_kernel+0x160/0x26c
[    0.166383]  secondary_start_kernel from 0x40101a00
[    0.166498] CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
[    0.166515] CPU1: Spectre v2: using BPIALL workaround
[    0.265631] smp: Brought up 1 node, 2 CPUs
[    0.268660] SMP: Total of 2 processors activated (96.00 BogoMIPS).
[    0.274583] CPU: All CPU(s) started in SVC mode.

> Finally, is there a QEMU emulation of one of these failing boards?

yes, smdkc210, I've reproduced it with the following command:

qemu-system-arm -serial null -serial stdio -kernel /tftpboot/qemu/zImage
-dtb /tftpboot/qemu/exynos4210-smdkv310.dtb -append
"console=ttySAC1,115200n8 earlycon root=/dev/mmcblk0p2 rootwait
init=/bin/bash ip=::::target::off cpuidle.off=1" -M smdkc210 -drive
file=qemu-smdkv310-mmcblk0.raw,if=sd,bus=0,index=2,format=raw


Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland