RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

From: Zhang, Yang Z
Date: Tue Dec 23 2014 - 23:05:31 EST


Wu, Feng wrote on 2014-12-24:
>
>
> Zhang, Yang Z wrote on 2014-12-24:
>> Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>> KVM list; Eric Auger
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>>
>> Jiang Liu wrote on 2014-12-24:
>>> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>>>> Paolo Bonzini wrote on 2014-12-23:
>>>>>
>>>>>
>>>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs
>>>>>>>> in host. Do you think it is a right behavior?
>>>>>>>
>>>>>>> No, the interrupt is not delivered at all in the host. Normally
>>>>>>> you'd have:
>>>>>>>
>>>>>>> - interrupt delivered to CPU from host affinity
>>>>>>>
>>>>>>> - VFIO interrupt handler writes to irqfd
>>>>>>>
>>>>>>> - interrupt delivered to vCPU from guest affinity
>>>>>>>
>>>>>>> Here, you just skip the first two steps. The interrupt is
>>>>>>> delivered to the thread that is running the vCPU directly, so
>>>>>>> the host affinity is bypassed entirely.
>>>>>>>
>>>>>>> ... unless you are considering the case where the vCPU is
>>>>>>> blocked and the host is processing the posted interrupt wakeup vector.
>>>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>>>> matching the host
>>> affinity.
>>>>>>
>>>>>> In my understanding, wakeup vector should have no relationship
>>>>>> with the host affinity of the irq. Wakeup notification event
>>>>>> should be delivered to the pCPU which the vCPU was blocked on.
>>>>>> And in kernel's point of view, the irq is not associated with
>>>>>> the wakeup vector,
>> right?
>>>>>
>>>>> That is correct indeed. It is not associated to the wakeup
>>>>> vector, hence this patch is right, I think.
>>>>>
>>>>> However, the wakeup vector has the same function as the VFIO
>>>>> interrupt handler, so you could argue that it is tied to the
>>>>> host affinity rather than the guest. Let's wait for Yang to answer.
>>>>
>>>> Actually, that's my original question too. I am wondering what
>>>> happens if the
>>> user changes the assigned device's affinity in host's /proc/irq/? If
>>> ignore it is acceptable, then this patch is ok. But it seems the
>>> discussion out of my scope, need some experts to tell us their idea
>>> since it will impact the user experience. Hi Yang,
>>
>> Hi Jiang,
>>
>>> Originally we have a proposal to return failure when user sets
>>> IRQ affinity through native OS interfaces if an IRQ is in PI mode.
>>> But that proposal will break CPU hot-removal because OS needs to
>>> migrate away all IRQs binding to the CPU to be offlined. Then we
>>> propose saving user IRQ affinity setting without changing hardware
>>> configuration (keeping PI configuration). Later when PI mode is
>>> disabled, the cached affinity setting will be used to setup IRQ
>>> destination for native OS. On the other hand, for IRQ in PI mode,
>>> it won't be delivered to native OS, so user may not sense that the
>>> IRQ is
>> delivered to CPUs other than those in the affinity set.
>>
>> The IRQ is still there but will be delivered to host in the form of
>> PI event(if the VCPU is running in root-mode). I am not sure whether
>> those interrupts should be reflected in /proc/interrupts? If the
>> answer is yes, then which entries should be used, a new PI entry or
>> use the
> original IRQ entry?
>
> Even though, setting the affinity of the IRQ in host should not affect
> the destination of the PI event (normal notification event of wakeup

This is your implementation. To me, disable PI if the VCPU is going to
run in the CPU out of IRQ affinity bitmap also is acceptable. And it will
keep the user interface looks the same as before.

Hi Thomas, Ingo, Peter

Can you guys help to review this patch? Really appreciate if you can give
some feedbacks.

> notification event), because the destination of the PI event is
> determined in NDST field of Posted-interrupts descriptor and PI
> notification vector is global. Just had a discussion with Jiang
> offline, maybe we can add the statistics information for the notification vector in /proc/interrupts just like any other global interrupts.
>
> Thanks,
> Feng
>
>>
>>> In that aspect, I think it's acceptable:) Regards!
>>
>> Yes, if all of you guys(especially the IRQ maintainer) are think it
>> is acceptable then we can follow current implementation and document it.
>>
>>> Gerry
>>>>
>>>>>
>>>>> Paolo
>>>>
>>>>
>>>> Best regards,
>>>> Yang
>>>>
>>>>
>>
>>
>> Best regards,
>> Yang
>>


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/