RE: pcm|dmaengine|imx-sdma race condition on i.MX6

From: Robin Gong
Date: Fri Aug 14 2020 - 04:45:25 EST


On 2020/08/13 19:23: Richard Leitner <richard.leitner@xxxxxxxxxxx> wrote:
> Hi,
> we've found a race condition with the PCM on the i.MX6 which results in an
> -EIO for the SNDRV_PCM_IOCTL_READI_FRAMES ioctl after an -EPIPE (XRUN).
>
> A possible reproduction may look like the following reduced call graph during a
> PCM capture:
>
> us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES)
> - wait_for_avail()
> - schedule_timeout()
> -> snd_pcm_update_hw_ptr0()
> - snd_pcm_update_state: EPIPE (XRUN)
> - sdma_disable_channel_async() # get's scheduled away due to sleep us
> <- ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns -EPIPE us ->
> ioctl(SNDRV_PCM_IOCTL_PREPARE) # as reaction to the EPIPE (XRUN) us ->
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) # next try to capture frames
> - sdma_prep_dma_cyclic()
> - sdma_load_context() # not loaded as context_loaded is 1
> - wait_for_avail()
> - schedule_timeout()
> # now the sdma_channel_terminate_work() comes back and sets #
> context_loaded = false and frees in vchan_dma_desc_free_list().
> us <- ioctl returns -EIO (capture write error (DMA or IRQ trouble?))
Seems the write error caused by context_loaded not set to false before
next transfer start? If yes, please have a try with the 03/04 of the below
patch set, anyway, could you post your failure log?
https://lkml.org/lkml/2020/8/11/111

>
>
> What we have found out, based on our understanding:
> The dmaengine docu states that a dmaengine_terminate_async() must be
> followed by a dmaengine_synchronize().
> However, in the pcm_dmaengine.c, only dmaengine_terminate_async() is
> called (for performance reasons and because it might be called from an
> interrupt handler).
>
> In our tests, we saw that the user-space immediately calls
> ioctl(SNDRV_PCM_IOCTL_PREPARE) as a handler for the happened xrun
> (previous ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns with -EPIPE). In
> our case (imx-sdma.c), the terminate really happens asynchronously with a
> worker thread which is not awaited/synchronized by the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) call.
>
> Since the syscall immediately enters an atomic context
> (snd_pcm_stream_lock_irq()), we are not able to flush the work of the
> termination worker from within the DMA context. This leads to an
> unterminated DMA getting re-initialized and then terminated.
>
> On the i.MX6 platform the problem is (if I got it correctly) that the
> sdma_channel_terminate_work() called after the -EPIPE gets scheduled away
> (for the 1-2ms sleep [1]). During that time the userspace already sends in the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) and
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
> As none of them are anyhow synchronized to the terminate_worker the
> vchan_dma_desc_free_list() [2] and "sdmac->context_loaded = false;" [3] are
> executed during the wait_for_avail() [4] of the
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
>
> To make sure we identified the problem correctly we've tested to add a
> "dmaengine_synchronize()" before the snd_pcm_prepare() in [5]. This fixed the
> race condition in all our tests. (Before we were able to reproduce it in 100% of
> the test runs).
>
> Based on our understanding, there are two different points to ensure the
> termination:
> Either ensure that the termination is finished within the previous
> SNDRV_PCM_IOCTL_READI_FRAMES call (inside the DMA context) or finishing
> it in the SNDRV_PCM_IOCTL_PREPARE call (and all other applicable ioclts)
> before entering the atomic context (from the PCM context).
>
> We initially thought about implementing the first approach, basically splitting
> up the dma_device terminate_all operation into a sync
> (busy-wait) and a async one. This would align the operations with the
> DMAengine interface and would enable a sync termination variant from atomic
> contexts.
> However, we saw that the dma_free_attrs() function has a WARN_ON on irqs
> disabled, which would be the case for the sync variant.
> Side note: We found this issue on the current v5.4.y LTS branch, but it also
> affects v5.8.y.
>
> Any feedback or pointers how we may fix the problem are warmly welcome!
> If anything is unclear please just ask :-)
>
> regards;
> Richard Leitner
> Benjamin Bara
>
> [1]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1066&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&amp;sdata=D9F%2FRUG27xv9nv8J1KtrLtld2eaI6gsXiWIAIgk
> Avjw%3D&amp;reserved=0
> [2]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1071&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&amp;sdata=0EKDVgzOZzL7TpX4ykhqjvpz5ryUHUpWw7frRe
> cksBU%3D&amp;reserved=0
> [3]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1072&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&amp;sdata=aIhatvb1ocQqyYCVFEg71LgJlRBoVusbDFPIxnte
> PuY%3D&amp;reserved=0
> [4]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_lib.c%23L1
> 825&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7e
> 7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6373
> 29145824073919&amp;sdata=y0Udbd%2FKGaVgqLrcp6fNOlMlFCGHCMfojkpp
> B4HzUuE%3D&amp;reserved=0
> [5]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_native.c%2
> 3L3226&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f
> 7e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
> 7329145824073919&amp;sdata=ch3BQ5DDGU5HWXqIZSvUeFnBoRoP%2BMM
> HEpnk8mIfWj8%3D&amp;reserved=0