RE: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

From: Shameerali Kolothum Thodi
Date: Tue Nov 13 2018 - 10:58:03 EST




> -----Original Message-----
> From: mika.westerberg@xxxxxxxxxxxxxxx
> [mailto:mika.westerberg@xxxxxxxxxxxxxxx]
> Sent: 13 November 2018 15:08
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>
> Cc: linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Wangzhou (B)
> <wangzhou1@xxxxxxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx>; Lukas
> Wunner <lukas@xxxxxxxxx>
> Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue

[...]

> > Right. As I mentioned in my previous mail, I missed the fact that you are
> updating
> > the ctrl->slot_ctrl with cmd value while in my test I did my update with the
> value
> > returned by pcie_capability_read_word().
>
> OK, I see.
>
> > > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> > > pciehp_isr().
> >
> > Ok.
> >
> > > Here's an updated patch, can you try and see if it makes any difference?
> >
> > I just tried this and it works. Thanks.
>
> Can you still check that the previous one (without _CCIE check) works?

Yes, it works for me without _CCIE.

> > See few comments below.
> >
> > > diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> > > b/drivers/pci/hotplug/pciehp_hpc.c
> > > index 7dd443aea5a5..da2cbe892444 100644
> > > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> *ctrl,
> > > u16 cmd,
> > > slot_ctrl |= (cmd & mask);
> > > ctrl->cmd_busy = 1;
> > > smp_mb();
> > > + ctrl->slot_ctrl = slot_ctrl;
> >
> > Does it make more sense if we can move this before smp_mb()?. Also I am
> not
> > sure updating the ctrl->slot_ctrl before actually the hardware is
> programmed
> > with that value will result in any other race conditions? TBH, I am not that
> familiar
> > with this code and I leave that to you :)
>
> Both are good questions :)
>
> For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I
> think we should be fine and this is actually more correct because if we
> are unmasking interrupts they may trigger immediately making
> pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the
> issue you reported).

Ok. I was more concerned about an unsolicited event triggering the _isr
while we are modifying the ctrl->slot_ctrl. But that's ok I think as the _isr
reads the hw status anyway.

> The smb_mb() thing is not that clear (at least to me) because it is used
> in two places in the driver and both seem to be making write to
> ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
> the read part.
>
> I may be missing something, though.

I think the read part is in wait_event_timeout() which evaluates the condition.
The wake_up is called from the pciehp_isr(). Since the flag is being updated
in both process level and interrupt handler context, smp_mb() is used. I think
the same now applies to ctrl->slot_ctrl now as this being used in process
context and interrupt context as well.

Thanks,
Shameer