Re: [PATCH v2] ath9k: sleep for less time when unregistering hwrng

From: Gregory Erwin
Date: Fri Jun 24 2022 - 20:14:09 EST


On Fri, Jun 24, 2022 at 1:44 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
>
> Even though hwrng provides a `wait` parameter, it doesn't work very well
> when waiting for a long time. There are numerous deadlocks that emerge
> related to shutdown. Work around this API limitation by waiting for a
> shorter amount of time and erroring more frequently. This commit also
> prevents hwrng from splatting messages to dmesg when there's a timeout
> and prevents calling msleep_interruptible() for tons of time when a
> thread is supposed to be shutting down, since msleep_interruptible()
> isn't actually interrupted by kthread_stop().
>
> Reported-by: Gregory Erwin <gregerwin256@xxxxxxxxx>
> Cc: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
> Cc: Kalle Valo <kvalo@xxxxxxxxxx>
> Cc: Rui Salvaterra <rsalvaterra@xxxxxxxxx>
> Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c")
> Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@xxxxxxxxxxxxxx/
> Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@xxxxxxxxxxxxxx/
> Link: https://bugs.archlinux.org/task/75138
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
> ---
> I do not have an ath9k and therefore I can't test this myself. The
> analysis above was done completely statically, with no dynamic tracing
> and just a bug report of symptoms from Gregory. So it might be totally
> wrong. Thus, this patch very much requires Gregory's testing. Please
> don't apply it until we have his `Tested-by` line.
>
> drivers/char/hw_random/core.c | 10 ++++++++--
> drivers/net/wireless/ath/ath9k/rng.c | 19 ++-----------------
> 2 files changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 16f227b995e8..af1c1905bb7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -513,8 +513,13 @@ static int hwrng_fillfn(void *unused)
> break;
>
> if (rc <= 0) {
> - pr_warn("hwrng: no data available\n");
> - msleep_interruptible(10000);
> + int i;
> +
> + for (i = 0; i < 100; ++i) {
> + if (kthread_should_stop() ||
> + msleep_interruptible(10000 / 100))
> + goto out;
> + }
> continue;
> }
>
> @@ -529,6 +534,7 @@ static int hwrng_fillfn(void *unused)
> add_hwgenerator_randomness((void *)rng_fillbuf, rc,
> entropy >> 10);
> }
> +out:
> hwrng_fill = NULL;
> return 0;
> }
> diff --git a/drivers/net/wireless/ath/ath9k/rng.c b/drivers/net/wireless/ath/ath9k/rng.c
> index cb5414265a9b..883110c66e5e 100644
> --- a/drivers/net/wireless/ath/ath9k/rng.c
> +++ b/drivers/net/wireless/ath/ath9k/rng.c
> @@ -52,20 +52,6 @@ static int ath9k_rng_data_read(struct ath_softc *sc, u32 *buf, u32 buf_size)
> return j << 2;
> }
>
> -static u32 ath9k_rng_delay_get(u32 fail_stats)
> -{
> - u32 delay;
> -
> - if (fail_stats < 100)
> - delay = 10;
> - else if (fail_stats < 105)
> - delay = 1000;
> - else
> - delay = 10000;
> -
> - return delay;
> -}
> -
> static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
> {
> struct ath_softc *sc = container_of(rng, struct ath_softc, rng_ops);
> @@ -80,10 +66,9 @@ static int ath9k_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait)
> bytes_read += max & 3UL;
> memzero_explicit(&word, sizeof(word));
> }
> - if (!wait || !max || likely(bytes_read) || fail_stats > 110)
> + if (!wait || !max || likely(bytes_read) ||
> + ++fail_stats >= 100 || msleep_interruptible(5))
> break;
> -
> - msleep_interruptible(ath9k_rng_delay_get(++fail_stats));
> }
>
> if (wait && !bytes_read && max)
> --
> 2.35.1
>

Jason,

This patch is working as you described. Trying to read from /dev/hwrng
consistently blocks for only 1.3s before returning an IO error. The longest
that I observed 'ip link set wlan0 down' to block was also about 1.3s,
and that was immediately after 'cat /dev/hwrng'. Additionally, the longest
duration that I observed for wiphy_suspend() to return was just under 100ms.

Tested-by: Gregory Erwin <gregerwin256@xxxxxxxxx>