Re: [BUG] mmc: core: adjust polling interval for CMD1

From: Jean Rene Dawin
Date: Fri Mar 04 2022 - 04:42:49 EST


Huijin Park wrote on Thu 3/03/22 21:16:
> Hi, sorry for the late reply.
> I guess the problem occurs depending on the eMMC model.
> Because I tested again and there was no problem.
> The eMMC model used for the test are as follows.
> (THGBMJG6C1LBAIL, KLM8G1GETF-B041)
> Anyway I guess the cause is interval time.
> I wrote a debug patch assuming that the reason was that some mmc models
> could not process CMD1 delivered at short intervals.
> I copied the polling function and adds interval minimum time parameter.
> I set the minimum time to 1ms. You can adjust it.
> Please test if there is no problem with the debug patch.
>
> >Hi,
> >
> >> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx>:
> >>
> >> Ulf Hansson wrote on Tue 1/03/22 14:38:
> >>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
> >>>>
> >>>
> >>> From: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >>> Date: Tue, 1 Mar 2022 14:24:21 +0100
> >>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> >>>
> >>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> >>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> >>> which is probably more inline with its previous value, to fix the reported
> >>> timeout problems.
> >>>
> >>> While at it, let's add a define for the timeout value, rather than using
> >>> a hard-coded value for it.
> >>>
> >>> Reported-by: Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx>
> >>> Reported-by: H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>
> >>> Cc: Huijin Park <huijin.park@xxxxxxxxxxx>
> >>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> >>> Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >>> ---
> >>> drivers/mmc/core/mmc_ops.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >>> index d63d1c735335..1f57174b3cf3 100644
> >>> --- a/drivers/mmc/core/mmc_ops.c
> >>> +++ b/drivers/mmc/core/mmc_ops.c
> >>> @@ -21,6 +21,7 @@
> >>>
> >>> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
> >>> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
> >>> +#define MMC_OP_COND_TIMEOUT_MS 2000 /* 2s */
> >>>
> >>> static const u8 tuning_blk_pattern_4bit[] = {
> >>> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> >>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> >>> ocr, u32 *rocr)
> >>> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >>> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >>>
> >>> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> >>> + err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> >>> + &__mmc_send_op_cond_cb, &cb_data);
> >>> if (err)
> >>> return err;
> >>>
> >>> --
> >>> 2.25.1
> >>
> >> Hi,
> >>
> >> thanks. But testing with this patch still gives the same errors:
> >>
> >> [ 52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> >> [ 52.273380] mmc1: error -110 doing runtime resume
> >>
> >> and the system gets stuck eventually.
> >
> >Same result from my tests.
> >
> >BR and thanks,
> >Nikolaus
>
>
> From c2458cb998dd8e275fefba52dd2532beb153c82d Mon Sep 17 00:00:00 2001
> From: Huijin Park <huijin.park@xxxxxxxxxxx>
> Date: Thu, 3 Mar 2022 20:43:22 +0900
> Subject: [PATCH] mmc: core: extend timeout and set min time for op_cond
>
> This patch extends timeout to 2s and sets minimal interval time for
> checking op_cond stuck problem.
>
> Signed-off-by: Huijin Park <huijin.park@xxxxxxxxxxx>
>
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index d63d1c735335..ccad6379d183 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -21,6 +21,7 @@
>
> #define MMC_BKOPS_TIMEOUT_MS (120 * 1000) /* 120s */
> #define MMC_SANITIZE_TIMEOUT_MS (240 * 1000) /* 240s */
> +#define MMC_OP_COND_TIMEOUT_MS ( 2 * 1000) /* 2s */
>
> static const u8 tuning_blk_pattern_4bit[] = {
> 0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> @@ -179,6 +180,47 @@ int mmc_go_idle(struct mmc_host *host)
> return err;
> }
>
> +static int ____mmc_poll_for_busy(struct mmc_host *host, unsigned int udelay_min,
> + unsigned int timeout_ms,
> + int (*busy_cb)(void *cb_data, bool *busy),
> + void *cb_data)
> +{
> + int err;
> + unsigned long timeout;
> + unsigned int udelay = udelay_min, udelay_max = 32768;
> + bool expired = false;
> + bool busy = false;
> +
> + timeout = jiffies + msecs_to_jiffies(timeout_ms) + 1;
> + do {
> + /*
> + * Due to the possibility of being preempted while polling,
> + * check the expiration time first.
> + */
> + expired = time_after(jiffies, timeout);
> +
> + err = (*busy_cb)(cb_data, &busy);
> + if (err)
> + return err;
> +
> + /* Timeout if the device still remains busy. */
> + if (expired && busy) {
> + pr_err("%s: Card stuck being busy! %s\n",
> + mmc_hostname(host), __func__);
> + return -ETIMEDOUT;
> + }
> +
> + /* Throttle the polling rate to avoid hogging the CPU. */
> + if (busy) {
> + usleep_range(udelay, udelay * 2);
> + if (udelay < udelay_max)
> + udelay *= 2;
> + }
> + } while (busy);
> +
> + return 0;
> +}
> +
> static int __mmc_send_op_cond_cb(void *cb_data, bool *busy)
> {
> struct mmc_op_cond_busy_data *data = cb_data;
> @@ -232,7 +274,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32 ocr, u32 *rocr)
> cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>
> - err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> + err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
> + &__mmc_send_op_cond_cb, &cb_data);
> if (err)
> return err;
>
> --
> 2.17.1
>

Hi,

thanks for the patch.
I started with a value of 10 ms :
err = ____mmc_poll_for_busy(host, 10, MMC_OP_COND_TIMEOUT_MS,
&__mmc_send_op_cond_cb, &cb_data);

and the result was agian

[ 23.349932] mmc1: Card stuck being busy! __mmc_poll_for_busy
[ 23.355936] mmc1: error -110 doing runtime resume

Same with 100 ms and 500 ms.

The messages seem to come from __mmc_poll_for_busy and not ____mmc_poll_for_busy.
Yet, changing the value in the call of ____mmc_poll_for_busy to 1000 ms changed the behaviour:

err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
&__mmc_send_op_cond_cb, &cb_data);

With this I didn't get any "stuck" errors and the board seemed stable.

Regards,
Jean Rene Dawin