Re: [PATCH 3/4] x86: hyperv: Expose hv_map_msi_interrupt function
From: Nuno Das Neves
Date: Mon Jun 23 2025 - 18:15:39 EST
On 6/20/2025 9:19 AM, Thomas Gleixner wrote:
> On Wed, Jun 18 2025 at 14:08, Nuno Das Neves wrote:
>> On 6/11/2025 4:07 PM, Michael Kelley wrote:
>>> From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, June 10, 2025 4:52 PM
>>>> +/**
>>>> + * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor.
>>>> + * @data: Describes the IRQ
>>>> + * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL)
>>>> + *
>>>> + * Map the IRQ in the hypervisor by issuing a MAP_DEVICE_INTERRUPT hypercall.
>>>> + */
>>>> +int hv_map_msi_interrupt(struct irq_data *data,
>>>> + struct hv_interrupt_entry *out_entry)
>>>> {
>>>> - union hv_device_id device_id = hv_build_pci_dev_id(dev);
>>>> + struct msi_desc *msidesc;
>>>> + struct pci_dev *dev;
>>>> + union hv_device_id device_id;
>>>> + struct hv_interrupt_entry dummy;
>>>> + struct irq_cfg *cfg = irqd_cfg(data);
>>>> + const cpumask_t *affinity;
>>>> + int cpu;
>>>> + u64 res;
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations
>
>>>>
>>>> - return hv_map_interrupt(device_id, false, cpu, vector, entry);
>>>> + msidesc = irq_data_get_msi_desc(data);
>>>> + dev = msi_desc_to_pci_dev(msidesc);
>>>> + device_id = hv_build_pci_dev_id(dev);
>>>> + affinity = irq_data_get_effective_affinity_mask(data);
>>>> + cpu = cpumask_first_and(affinity, cpu_online_mask);
>>>
>>> Is the cpus_read_lock held at this point? I'm not sure what the
>>> overall calling sequence looks like. If it is not held, the CPU that
>>> is selected could go offline before hv_map_interrupt() is called.
>>> This computation of the target CPU is the same as in the code
>>> before this patch, but that existing code looks like it has the
>>> same problem.
>>>
>>
>> Thanks for pointing it out - It *looks* like the read lock is not held
>> everywhere this could be called, so it could indeed be a problem.
>>
>> I've been thinking about different ways around this but I lack the
>> knowledge to have an informed opinion about it:
>>
>> - We could take the cpu read lock in this function, would that work?
>
> Obviously not.
>
>> - I'm not actually sure why the code is getting the first cpu off the effective
>> affinity mask in the first place. It is possible to get the apic id (and hence
>> the cpu) already associated with the irq, as per e.g. x86_vector_msi_compose_msg()
>> Maybe we could get the cpu that way, assuming that doesn't have a
>> similar issue.
>
> There is no reason to fiddle in the underlying low level data. The
> effective affinity mask is there for a reason.
>
>> - We could just let this race happen, maybe the outcome isn't too catastrophic?
>
> Let's terminate guesswork mode and look at the facts.
>
> The point is that hv_map_msi_interrupt() is called from:
>
> 1) hv_irq_compose_msi_msg()
>
> 2) hv_arch_irq_unmask() (in patch 4/4)
>
> Both functions are interrupt chip callbacks and invoked with the
> interrupt descriptor lock held.
>
> At the point where they are called, the effective affinity mask is valid
> and immutable. Nothing can modify it as any modification requires the
> interrupt descriptor lock to be held. This applies to the CPU hotplug
> machinery too. So this AND cpu_online_mask is a complete pointless
> voodoo exercise.
>
Thanks for explaining.
> Just use:
>
> cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
>
> and be done with it.
>
> Please fix that first with a seperate patch before moving this code
> around.
Will do!
Nuno
>
> Thanks,
>
> tglx