Re: [BUG] mmc: core: adjust polling interval for CMD1

From: Ulf Hansson
Date: Thu Mar 03 2022 - 05:41:42 EST


On Wed, 2 Mar 2022 at 10:40, H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
> > Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx>:
> >
> > Ulf Hansson wrote on Tue 1/03/22 14:38:
> >> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
> >>>
> >>
> >> From: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >> Date: Tue, 1 Mar 2022 14:24:21 +0100
> >> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> >>
> >> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> >> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> >> which is probably more inline with its previous value, to fix the reported
> >> timeout problems.
> >>
> >> While at it, let's add a define for the timeout value, rather than using
> >> a hard-coded value for it.
> >>
> >> Reported-by: Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx>
> >> Reported-by: H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>
> >> Cc: Huijin Park <huijin.park@xxxxxxxxxxx>
> >> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> >> Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >> ---
> >> drivers/mmc/core/mmc_ops.c | 4 +++-
> >> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >> index d63d1c735335..1f57174b3cf3 100644
> >> --- a/drivers/mmc/core/mmc_ops.c
> >> +++ b/drivers/mmc/core/mmc_ops.c
> >> @@ -21,6 +21,7 @@
> >>
> >> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
> >> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
> >> +#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */
> >>
> >> static const u8 tuning_blk_pattern_4bit[] = {
> >> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> >> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> >> ocr, u32 *rocr)
> >> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >>
> >> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> >> + err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> >> + &__mmc_send_op_cond_cb, &cb_data);
> >> if (err)
> >> return err;
> >>
> >> --
> >> 2.25.1
> >
> > Hi,
> >
> > thanks. But testing with this patch still gives the same errors:
> >
> > [ 52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> > [ 52.273380] mmc1: error -110 doing runtime resume
> >
> > and the system gets stuck eventually.
>
> Same result from my tests.
>
> BR and thanks,
> Nikolaus

Alright, it starts to sound like we might need a revert (or at least a
way to restore the previous behaviour) - even if that would just paper
over the real problem. The real problem is more likely related to how
the host driver manages restoring of the power to the card, which
happens when runtime resuming it.

In any case, just to make sure the loop timeout itself isn't the
problem, can you run the below debug patch please? The intent is to
figure out how long the worst case timeout is, when it's working with
CMD1. As soon as the timeout exceeds the worst case, there is a
message printed to the log.

The below patch is based upon that the offending commit has been reverted.

Kind regards
Uffe

From: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
Date: Thu, 3 Mar 2022 11:00:04 +0100
Subject: [PATCH] mmc: core: DEBUG - Measure and log worst case CMD1 loop

Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
---
drivers/mmc/core/mmc_ops.c | 11 +++++++++++
include/linux/mmc/host.h | 1 +
2 files changed, 12 insertions(+)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index 9946733a34c6..3f03d9e8a3a4 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -177,11 +177,15 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
{
struct mmc_command cmd = {};
int i, err = 0;
+ s64 cmd1_ms;
+ ktime_t time_start;

cmd.opcode = MMC_SEND_OP_COND;
cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;

+ time_start = ktime_get();
+
for (i = 100; i; i--) {
err = mmc_wait_for_cmd(host, &cmd, 0);
if (err)
@@ -211,6 +215,13 @@ int mmc_send_op_cond(struct mmc_host *host, u32
ocr, u32 *rocr)
cmd.arg = cmd.resp[0] | BIT(30);
}

+ cmd1_ms = ktime_to_ms(ktime_sub(ktime_get(), time_start));
+ if (!err && cmd1_ms > host->cmd1_ms) {
+ pr_warn("%s: CMD1 timeout increased to %lld ms, loop=%d\n",
+ mmc_hostname(host), cmd1_ms, i);
+ host->cmd1_ms = cmd1_ms;
+ }
+
if (rocr && !mmc_host_is_spi(host))
*rocr = cmd.resp[0];

diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 7afb57cab00b..c2ca3bb05620 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -301,6 +301,7 @@ struct mmc_host {
unsigned int f_min;
unsigned int f_max;
unsigned int f_init;
+ s64 cmd1_ms;
u32 ocr_avail;
u32 ocr_avail_sdio; /* SDIO-specific OCR */
u32 ocr_avail_sd; /* SD-specific OCR */
--
2.25.1