Re: [PATCH] xen: reuse the same pirq allocated when driver loadfirst time

From: Konrad Rzeszutek Wilk
Date: Tue May 14 2013 - 10:20:51 EST


On Tue, May 14, 2013 at 02:49:50PM +0100, Stefano Stabellini wrote:
> On Mon, 13 May 2013, Konrad Rzeszutek Wilk wrote:
> > On Mon, May 13, 2013 at 06:24:46PM +0100, Stefano Stabellini wrote:
> > > On Mon, 13 May 2013, Konrad Rzeszutek Wilk wrote:
> > > > On Mon, May 13, 2013 at 03:50:52PM +0100, Stefano Stabellini wrote:
> > > > > On Mon, 13 May 2013, Konrad Rzeszutek Wilk wrote:
> > > > > > On Mon, May 13, 2013 at 12:06:43PM +0100, Stefano Stabellini wrote:
> > > > > > > On Fri, 10 May 2013, Konrad Rzeszutek Wilk wrote:
> > > > > > > > On Wed, May 08, 2013 at 04:18:24PM +0800, Zhenzhong Duan wrote:
> > > > > > > > > When driver load and unload in a loop, pirq will exhaust finally.
> > > > > > > > > Try to use the same pirq which was already mapped and binded at first time
> > > > > > > >
> > > > > > > > So what happens if I unload and reload two drivers in random order?
> > > > > > > >
> > > > > > > > > when driver loaded.
> > > > > > > > >
> > > > > > > > > Read pirq from msix entry and test if data is XEN_PIRQ_MSI_DATA
> > > > > > > > > xen_irq_from_pirq(pirq) < 0 checking is wrong as irq will be freed
> > > > > > > > > when driver unload, it's always true in second load.
> > > > > > > >
> > > > > > > > If my understanding is right the issue at hand is that the caching
> > > > > > > > information about the pirq disappears once the driver has been
> > > > > > > > unloaded b/c the event's irq-info is removed (as the driver is
> > > > > > > > unloaded and free_irq is called).
> > > > > > > >
> > > > > > > > Stefano,
> > > > > > > > Is there a specific write to the MSI structure that would cause the
> > > > > > > > hypervisor to drop the PIRQ? Or a nice hypercall to "free" an
> > > > > > > > PIRQ in usage?
> > > > > > >
> > > > > > > We already have a "free PIRQ" hypercall, it's called
> > > > > > > PHYSDEVOP_unmap_pirq and should be called by QEMU.
> > > > > >
> > > > > > Considering that we call function that allocates (PHYSDEVOP_get_free_pirq)
> > > > > > it in the Linux kernel (and not in QEMU), perhaps that should be done in the
> > > > > > Linux kernel as part of xen_destroy_irq()? Or would that confuse QEMU?
> > > > >
> > > > > I think it would confuse QEMU. It is probably better to let the unmap
> > > > > being handled by it.
> > > > >
> > > > >
> > > > > > It looks like QEMU only does that hypercall (via xc_physdev_unmap_pirq)
> > > > > > unregister_real_device which is only called during pci unplug?
> > > > >
> > > > > You are right! I would think that this behaviour is erroneous unless it
> > > > > was done on purpose to avoid allocating MSIs twice.
> > > > > If that is the case we would need to do something similar in Linux too.
> > > > >
> > > > > I think that the issue is the mismatch between QEMU's and Linux's
> > > > > behaviours: either both should be allocating MSIs once, or they should
> > > > > both be allocating and deallocating MSIs every time the driver is loaded
> > > > > and unloaded.
> > > >
> > > > Right. But we also have the scenario that QEMU and Linux are going to
> > > > be out of sync. So we need fixes in both places - I think.
> > >
> > > QEMU is the owner of the pirq, in fact it is the one that creates and
> > > destroys the mapping. I think that the right place to fix this problem
> > > is in QEMU, the ABI would be much cleaner as a result. As a side effect
> > > we don't need to make any changes in Linux.
> >
> > You do. You need to remove the PHYSDEVOP_get_free_pirq call in that case.
>
> PHYSDEVOP_get_free_pirq needs to stay, because Linux needs to know the
> pirq that QEMU is going to use.

That looks like an API violation. We have an hypercall that
allocates the PIRQ in the Linux, then two hypercalls in the QEMU
layer - one to map, and the other to unmap and free.


>
> However I would let QEMU handle the mapping (it already does that in
> pt_msi_setup calling xc_physdev_map_pirq_msi) and unmapping (that is
> done by calling xc_domain_unbind_msi_irq from pt_msi_disable).
> I think the problem is that pt_msi_disable is only called on
> unregister_real_device and pt_reset_interrupt_and_io_mapping, not when
> the guest disables MSIs.

Sure, I am not disputing that. I think the fix in QEMU to call the
unmap is correct.

But I am also wondering whether it makes sense to do that in the Linux
kernel - as it does the alloc in the first place. Seems like a bit of
duct-tape has been used to connect this plumbing together.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/