Re: [PATCH v2 3/3] brcmfmac: sdio: Disable auto-tuning around commands expected to fail

From: Adrian Hunter
Date: Fri Jun 07 2019 - 08:33:56 EST


On 7/06/19 12:37 AM, Doug Anderson wrote:
> Hi,
>
> On Thu, Jun 6, 2019 at 7:00 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>>
>> On 3/06/19 9:37 PM, Douglas Anderson wrote:
>>> There are certain cases, notably when transitioning between sleep and
>>> active state, when Broadcom SDIO WiFi cards will produce errors on the
>>> SDIO bus. This is evident from the source code where you can see that
>>> we try commands in a loop until we either get success or we've tried
>>> too many times. The comment in the code reinforces this by saying
>>> "just one write attempt may fail"
>>>
>>> Unfortunately these failures sometimes end up causing an "-EILSEQ"
>>> back to the core which triggers a retuning of the SDIO card and that
>>> blocks all traffic to the card until it's done.
>>>
>>> Let's disable retuning around the commands we expect might fail.
>>
>> It seems to me that re-tuning needs to be prevented before the
>> first access otherwise it might be attempted there,
>
> By this I think you mean I wasn't starting my section early enough to
> catch the "1st KSO write". Oops. Thanks!
>
>
>> and it needs
>> to continue to be prevented during the transition when it might
>> reasonably be expected to fail.
>>
>> What about something along these lines:
>>
>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> index 4e15ea57d4f5..d932780ef56e 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> @@ -664,9 +664,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>> int err = 0;
>> int err_cnt = 0;
>> int try_cnt = 0;
>> + int need_retune = 0;
>> + bool retune_release = false;
>>
>> brcmf_dbg(TRACE, "Enter: on=%d\n", on);
>>
>> + /* Cannot re-tune if device is asleep */
>> + if (on) {
>> + need_retune = sdio_retune_get_needed(bus->sdiodev->func1); // TODO: host->can_retune ? host->need_retune : 0
>> + sdio_retune_hold_now(bus->sdiodev->func1); // TODO: add sdio_retune_hold_now()
>> + retune_release = true;
>> + }
>
> The code below still has retries even for the "!on" case. That
> implies that you could still get CRC errors from the card in the "!on"
> direction too. Any reason why we shouldn't just hold retuning even
> for the !on case?

No

>
>
>> +
>> wr_val = (on << SBSDIO_FUNC1_SLEEPCSR_KSO_SHIFT);
>> /* 1st KSO write goes to AOS wake up core if device is asleep */
>> brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val, &err);
>> @@ -711,8 +720,16 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>> err_cnt = 0;
>> }
>> /* bail out upon subsequent access errors */
>> - if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS))
>> - break;
>> + if (err && (err_cnt++ > BRCMF_SDIO_MAX_ACCESS_ERRORS)) {
>> + if (!retune_release)
>> + break;
>> + /*
>> + * Allow one more retry with re-tuning released in case
>> + * it helps.
>> + */
>> + sdio_retune_release(bus->sdiodev->func1);
>> + retune_release = false;
>
> I would be tempted to wait before adding this logic until we actually
> see that it's needed. Sure, doing one more transfer probably won't
> really hurt, but until we know that it actually helps it seems like
> we're just adding extra complexity?

Depends, what is the downside of unnecessarily returning an error from
brcmf_sdio_kso_control() in that case?

>
>
>> + }
>>
>> udelay(KSO_WAIT_US);
>> brcmf_sdiod_writeb(bus->sdiodev, SBSDIO_FUNC1_SLEEPCSR, wr_val,
>> @@ -727,6 +744,18 @@ brcmf_sdio_kso_control(struct brcmf_sdio *bus, bool on)
>> if (try_cnt > MAX_KSO_ATTEMPTS)
>> brcmf_err("max tries: rd_val=0x%x err=%d\n", rd_val, err);
>>
>> + if (retune_release) {
>> + /*
>> + * CRC errors are not unexpected during the transition but they
>> + * also trigger re-tuning. Clear that here to avoid an
>> + * unnecessary re-tune if it wasn't already triggered to start
>> + * with.
>> + */
>> + if (!need_retune)
>> + sdio_retune_clear_needed(bus->sdiodev->func1); // TODO: host->need_retune = 0
>> + sdio_retune_release(bus->sdiodev->func1); // TODO: add sdio_retune_release()
>> + }
>
> Every time I re-look at this I have to re-figure out all the subtle
> differences between the variables and functions involved here. Let me
> see if I got everything right:
>
> * need_retune: set to 1 if we can retune and some event happened that
> makes us truly believe that we need to be retuned, like we got a CRC
> error or a timer expired or our host controller told us to retune.
>
> * retune_now: set to 1 it's an OK time to be retuning. Specifically
> if retune_now is false we won't send any retuning commands but we'll
> still keep track of the need to retune.
>
> * hold_retune: If this gets set to 1 by mmc_retune_hold_now() then a
> future call to mmc_retune_hold() will _not_ schedule a retune by
> setting retune_now (because mmc_retune_hold() will see that
> hold_retune was already 1). ...and a future call to
> mmc_retune_recheck() between mmc_hold() and mmc_release() will also
> not schedule a retune because hold_retune will be 2 (or generally >
> 1).
>
> ---
>
> So overall trying to summarize what I think are the differences
> between your patch and my patch.
>
> 1. If we needed to re-tune _before_ calling brcmf_sdio_kso_control(),
> with your patch we'll make sure that we don't actually attempt to
> retune until brcmf_sdio_kso_control() finishes.
>
> 2. If we needed to retune during brcmf_sdio_kso_control() (because a
> timer expired?) then we wouldn't trigger that retune while
> brcmf_sdio_kso_control() is running.
>
> In the case of dw_mmc, which I'm most familiar with, we don't have any
> sort of automated or timed-based retuning. ...so we'll only re-tune
> when we see the CRC error. If I'm understanding things correctly then
> that for dw_mmc my solution and yours behave the same. That means the
> difference is how we deal with other retuning requests, either ones
> that come about because of an interrupt that the host controller
> provided or because of a timer. Did I get that right?
>
> ...and I guess the reason we have to deal specially with these cases
> is because any time that SDIO card is "sleeping" we don't want to
> retune because it won't work. Right? NOTE: the solution that would
> come to my mind first to solve this would be to hold the retuning for
> the whole time that the card was sleeping and then release it once the
> card was awake again. ...but I guess we don't truly need to do that
> because tuning only happens as a side effect of sending a command to
> the card and the only command we send to the card is the "wake up"
> command. That's why your solution to hold tuning while sending the
> "wake up" command works, right?
>
> ---
>
> OK, so assuming all the above is correct, I feel like we're actually
> solving two problems and in fact I believe we actually need both our
> approaches to solve everything correctly. With just your patch in
> place there's a problem because we will clobber any external retuning
> requests that happened while we were waking up the card. AKA, imagine
> this:
>
> A) brcmf_sdio_kso_control(on=True) gets called; need_retune starts as 0
>
> B) We call sdio_retune_hold_now()
>
> C) A retuning timer goes off or the SD Host controller tells us to retune
>
> D) We get to the end of brcmf_sdio_kso_control() and clear the "retune
> needed" since need_retune was 0 at the start.
>
> ...so we dropped the retuning request from C), right?

True

>
>
> What we truly need is:
>
> 1. CRC errors shouldn't trigger a retuning request when we're in
> brcmf_sdio_kso_control()
>
> 2. A separate patch that holds any retuning requests while the SDIO
> card is off. This patch _shouldn't_ do any clearing of retuning
> requests, just defer them.
>
>
> Does that make sense to you? If so, I can try to code it up...

Sounds good :-)