Re: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure

From: Dongli Zhang
Date: Mon Apr 22 2024 - 19:12:04 EST


Hi Thomas,

On 4/22/24 13:58, Thomas Gleixner wrote:
> On Thu, Apr 18 2024 at 18:33, Dongli Zhang wrote:
>
>> When a CPU is offline, its IRQs may migrate to other CPUs. For managed
>> IRQs, they are migrated, or shutdown (if all CPUs of the managed IRQ
>> affinity are offline). For regular IRQs, there will only be a
>> migration.
>
> Please write out interrupts. There is enough space for it and IRQ is
> just not a regular word.

I will use "interrupts".

>
>> The migrate_one_irq() first uses pending_mask or affinity_mask of the IRQ.
>>
>> 104 if (irq_fixup_move_pending(desc, true))
>> 105 affinity = irq_desc_get_pending_mask(desc);
>> 106 else
>> 107 affinity = irq_data_get_affinity_mask(d);
>>
>> The migrate_one_irq() may use all online CPUs, if all CPUs in
>> pending_mask/affinity_mask are already offline.
>>
>> 113 if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
>> 114 /*
>> 115 * If the interrupt is managed, then shut it down and leave
>> 116 * the affinity untouched.
>> 117 */
>> 118 if (irqd_affinity_is_managed(d)) {
>> 119 irqd_set_managed_shutdown(d);
>> 120 irq_shutdown_and_deactivate(desc);
>> 121 return false;
>> 122 }
>> 123 affinity = cpu_online_mask;
>> 124 brokeaff = true;
>> 125 }
>
> Please don't copy code into the change log. Describe the problem in
> text.

Would you mind suggesting if the below commit message is fine to you?


genirq/cpuhotplug: retry with cpu_online_mask when irq_do_set_affinity return
-ENOSPC

When a CPU goes offline, the interrupts pinned to that CPU are
re-configured.

Its managed interrupts undergo either migration to other CPUs or shutdown
if all CPUs listed in the affinity are offline. This patch doesn't affect
managed interrupts.

For regular interrupts, they are migrated to other selected online CPUs.
The target CPUs are chosen from either desc->pending_mask (suppose
CONFIG_GENERIC_PENDING_IRQ) or d->common->affinity (suppose CONFIG_SMP).
The cpu_online_mask is used as target CPUs only when CPUs in both
desc->pending_mask and d->common->affinity are offline.

However, there is a bad corner case, when desc->pending_mask or
d->common->affinity is selected as the target cpumask, but none of their
CPUs has any available vectors.

As a result, an -ENOSPC error happens:

"IRQ151: set affinity failed(-28)."

This is from the debugfs. The allocation fails although other online CPUs
(except CPU=2) have many free vectors.

name: VECTOR
size: 0
mapped: 529
flags: 0x00000103
Online bitmaps: 7
Global available: 884
Global reserved: 6
Total allocated: 539
System: 36: 0-19,21,50,128,236,243-244,246-255
| CPU | avl | man | mac | act | vectors
0 147 0 0 55 32-49,51-87
1 147 0 0 55 32-49,51-87
2 0 0 0 202 32-49,51-127,129-235
4 147 0 0 55 32-49,51-87
5 147 0 0 55 32-49,51-87
6 148 0 0 54 32-49,51-86
7 148 0 0 54 32-49,51-86

The steps to reproduce the issue are in [1]. The core idea is:

1. Create a KVM guest with many virtio-net PCI devices, each configured
with a very large number of queues/vectors.

2. Set the affinity of all virtio-net interrupts to "2,3".

3. Offline many CPUs, excluding "2,3".

4. Offline CPU=2, and irq_do_set_affinity() returns -ENOSPC.

For regular interrupts, if irq_do_set_affinity() returns -ENOSPC, retry it
with all online CPUs. The issue does not happen for managed interrupts
because the vectors are always reserved (in cm->managed_map) before the CPU
offline operation.

[1] https://lore.kernel.org/all/20240419013322.58500-1-dongli.zhang@xxxxxxxxxx/

Cc: Joe Jin <joe.jin@xxxxxxxxxx>
Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx>


>
>> However, there is a corner case. Although some CPUs in
>> pending_mask/affinity_mask are still online, they are lack of available
>> vectors. If the kernel continues calling irq_do_set_affinity() with those CPUs,
>> there will be -ENOSPC error.
>>
>> This is not reasonable as other online CPUs still have many available
>> vectors.
>
> Reasonable is not the question here. It's either correct or not.

This has been re-written in the new commit message.

>
>> name: VECTOR
>> size: 0
>> mapped: 529
>> flags: 0x00000103
>> Online bitmaps: 7
>> Global available: 884
>> Global reserved: 6
>> Total allocated: 539
>> System: 36: 0-19,21,50,128,236,243-244,246-255
>> | CPU | avl | man | mac | act | vectors
>> 0 147 0 0 55 32-49,51-87
>> 1 147 0 0 55 32-49,51-87
>> 2 0 0 0 202 32-49,51-127,129-235
>
> Just ouf of curiousity. How did this end up with CPU2 completely
> occupied?

The details are in the link:
https://lore.kernel.org/all/20240419013322.58500-1-dongli.zhang@xxxxxxxxxx/

Here is the core idea:

1. Create a KVM guest with many virtio-net PCI devices, each configured
with a very large number of queues/vectors.

2. Set the affinity of all virtio-net interrupts to "2,3".

3. Offline many CPUs, excluding "2,3".

4. Offline CPU=2, and irq_do_set_affinity() returns -ENOSPC.

>
>> 4 147 0 0 55 32-49,51-87
>> 5 147 0 0 55 32-49,51-87
>> 6 148 0 0 54 32-49,51-86
>> 7 148 0 0 54 32-49,51-86
>>
>> This issue should not happen for managed IRQs because the vectors are already
>> reserved before CPU hotplug.
>
> Should not? It either does or it does not.

It is "does not". I will fix it. Please see new commit message.

>
>> For regular IRQs, do a re-try with all online
>> CPUs if the prior irq_do_set_affinity() is failed with -ENOSPC.
>>
>> Cc: Joe Jin <joe.jin@xxxxxxxxxx>
>> Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
>> ---
>> kernel/irq/cpuhotplug.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
>> index 1ed2b1739363..d1666a6b73f4 100644
>> --- a/kernel/irq/cpuhotplug.c
>> +++ b/kernel/irq/cpuhotplug.c
>> @@ -130,6 +130,19 @@ static bool migrate_one_irq(struct irq_desc *desc)
>> * CPU.
>> */
>> err = irq_do_set_affinity(d, affinity, false);
>> +
>> + if (err == -ENOSPC &&
>> + !irqd_affinity_is_managed(d) &&
>> + affinity != cpu_online_mask) {
>
> This really wants to be a single line conditional.

I will change it to a single line.

Would you mind suggesting which is preferred? !cpumask_equal(affinity,
cpu_online_mask) or (affinity != cpu_online_mask)?


>
>> + affinity = cpu_online_mask;
>> + brokeaff = true;
>> +
>> + pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with all online CPUs\n",
>> + d->irq, cpumask_pr_args(affinity));
>
> How is it useful to print cpu_online_mask here?

My bad and sorry about that. I should print the log before changing 'affinity' I
will fix that in the new version.


Thank you very much for feedback!

Dongli Zhang