Re: [PATCH 1/2] mwifiex: Use non-posted PCI register writes

From: Pali Rohár
Date: Thu Sep 30 2021 - 12:39:10 EST


On Thursday 30 September 2021 18:22:42 Jonas Dreßler wrote:
> On 9/30/21 6:19 PM, Pali Rohár wrote:
> > On Thursday 30 September 2021 18:14:04 Jonas Dreßler wrote:
> > > On 9/30/21 5:42 PM, Pali Rohár wrote:
> > > > On Thursday 30 September 2021 17:38:43 Jonas Dreßler wrote:
> > > > > On 9/23/21 10:22 PM, Pali Rohár wrote:
> > > > > > On Thursday 23 September 2021 22:41:30 Andy Shevchenko wrote:
> > > > > > > On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@xxxxxxx> wrote:
> > > > > > > > On 9/22/21 2:50 PM, Jonas Dreßler wrote:
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > > - Just calling mwifiex_write_reg() once and then blocking until the card
> > > > > > > > wakes up using my delay-loop doesn't fix the issue, it's actually
> > > > > > > > writing multiple times that fixes the issue
> > > > > > > >
> > > > > > > > These observations sound a lot like writes (and even reads) are actually
> > > > > > > > being dropped, don't they?
> > > > > > >
> > > > > > > It sounds like you're writing into a not ready (fully powered on) device.
> > > > > >
> > > > > > This reminds me a discussion with Bjorn about CRS response returned
> > > > > > after firmware crash / reset when device is not ready yet:
> > > > > > https://lore.kernel.org/linux-pci/20210922164803.GA203171@bhelgaas/
> > > > > >
> > > > > > Could not be this similar issue? You could check it via reading
> > > > > > PCI_VENDOR_ID register from config space. And if it is not valid value
> > > > > > then card is not really ready yet.
> > > > > >
> > > > > > > To check this, try to put a busy loop for reading and check the value
> > > > > > > till it gets 0.
> > > > > > >
> > > > > > > Something like
> > > > > > >
> > > > > > > unsigned int count = 1000;
> > > > > > >
> > > > > > > do {
> > > > > > > if (mwifiex_read_reg(...) == 0)
> > > > > > > break;
> > > > > > > } while (--count);
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > With Best Regards,
> > > > > > > Andy Shevchenko
> > > > >
> > > > > I've tried both reading PCI_VENDOR_ID and the firmware status using a busy
> > > > > loop now, but sadly none of them worked. It looks like the card always
> > > > > replies with the correct values even though it sometimes won't wake up after
> > > > > that.
> > > > >
> > > > > I do have one new observation though, although I've no clue what could be
> > > > > happening here: When reading PCI_VENDOR_ID 1000 times to wakeup we can
> > > > > "predict" the wakeup failure because exactly one (usually around the 20th)
> > > > > of those 1000 reads will fail.
> > > >
> > > > What does "fail" means here?
> > >
> > > ioread32() returns all ones, that's interpreted as failure by
> > > mwifiex_read_reg().
> >
> > Ok. And can you check if PCI Bridge above this card has enabled CRSSVE
> > bit (CRSVisible in RootCtl+RootCap in lspci output)? To determinate if
> > Bridge could convert CRS response to all-ones as failed transaction.
> >
>
> Seems like that bit is disabled:
> > RootCap: CRSVisible-
> > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-

So it means that CRSSVE is unsupported by upper bridge. In case card
returns CRS response to system (via bridge) that is not ready yet,
bridge re-issue read request, and after some failures it returns to
system all-ones to indicate failed transaction. But all-ones can be
returned also by bridge when card does not send any response.

So from this test we do not know what happened. It would be nice to know
it, but such test requires to connect this card into system which
supports CRSSVE, in which case CRS response it passed directly to OS as
value 0xffff0001. Look at the link above where I discussed with Bjorn
about buggy wifi cards which resets internally, for more details.

But in this setup when CRSSVE is not supported, I think there is no
other option than just adding sleep prior accessing card...

For debugging such issues I got the only advice to use PCIe analyzer and
look at what is really happening on the bus. But required equipment for
this debugging is not cheap...

> > > >
> > > > > Maybe the firmware actually tries to wake up,
> > > > > encounters an error somewhere in its wakeup routines and then goes down a
> > > > > special failure code path. That code path keeps the cards CPU so busy that
> > > > > at some point a PCI_VENDOR_ID request times out?
> > > > >
> > > > > Or well, maybe the card actually wakes up fine, but we don't receive the
> > > > > interrupt on our end, so many possibilities...