Re: [PATCH] dmatest: terminate all ongoing transfers beforesubmitting new one

From: Andy Shevchenko
Date: Tue Oct 16 2012 - 05:35:50 EST


On Tue, Oct 16, 2012 at 11:56 AM, viresh kumar <viresh.kumar@xxxxxxxxxx> wrote:
> On Tue, Oct 16, 2012 at 2:15 PM, Andy Shevchenko
> <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
>> The following error messages come if we have software LLP emulation enabled and
>> enough threads running.
>>
>> modprobe dmatest iterations=40
>> [ 168.048601] dmatest: Started 1 threads using dma0chan0
>> [ 168.054546] dmatest: Started 1 threads using dma0chan1
>> [ 168.060441] dmatest: Started 1 threads using dma0chan2
>> [ 168.066333] dmatest: Started 1 threads using dma0chan3
>> [ 168.072250] dmatest: Started 1 threads using dma0chan4
>> [ 168.078144] dmatest: Started 1 threads using dma0chan5
>> [ 168.084057] dmatest: Started 1 threads using dma0chan6
>> [ 168.089948] dmatest: Started 1 threads using dma0chan7
>> [ 170.032962] dma0chan1-copy0: terminating after 40 tests, 0 failures (status 0)
>> [ 170.041274] dma0chan0-copy0: terminating after 40 tests, 0 failures (status 0)
>> [ 170.597559] dma0chan2-copy0: terminating after 40 tests, 0 failures (status 0)
>> [ 171.085059] dma0chan7-copy0: #0: test timed out
>> [ 171.839710] dma0chan3-copy0: terminating after 40 tests, 0 failures (status 0)
>> [ 172.146071] dma0chan4-copy0: terminating after 40 tests, 0 failures (status 0)
>> [ 172.220802] dma0chan7-copy0: #1: got completion callback, but status is 'in progress'
>> [ 172.242049] dma0chan7-copy0: #2: got completion callback, but status is 'in progress'
>> [ 172.281063] dma0chan7-copy0: #3: got completion callback, but status is 'in progress'
>> [ 172.400866] dma0chan7-copy0: #4: got completion callback, but status is 'in progress'
>> [ 172.471799] dma0chan7-copy0: #5: got completion callback, but status is 'in progress'
>> [ 172.613996] dma0chan7-copy0: #6: got completion callback, but status is 'in progress'
>> [ 172.670286] dma0chan7-copy0: #7: got completion callback, but status is 'in progress'
>> [ 172.750763] dma0chan7-copy0: #8: got completion callback, but status is 'in progress'
>> [ 172.777452] dma0chan5-copy0: terminating after 40 tests, 0 failures (status 0)
>> [ 172.788740] dma0chan7-copy0: #9: got completion callback, but status is 'in progress'
>> [ 172.845156] dma0chan7-copy0: #10: got completion callback, but status is 'in progress'
>> [ 172.906593] dma0chan7-copy0: #11: got completion callback, but status is 'in progress'
>> [ 173.181515] dma0chan6-copy0: terminating after 40 tests, 0 failures (status 0)
>> [ 173.512838] dma0chan7-copy0: terminating after 40 tests, 12 failures (status 0)
>>
>> The patch fixes dmatest module to stop any ongoing transfer before submitting
>> new one. Perhaps there is a better solution and driver logic needs to be fixed
>> as well.
>>
>> After patch we will have
>>
>> modprobe dmatest iterations=50
>> [ 84.027375] dmatest: Started 1 threads using dma0chan0
>> [ 84.033282] dmatest: Started 1 threads using dma0chan1
>> [ 84.039182] dmatest: Started 1 threads using dma0chan2
>> [ 84.045089] dmatest: Started 1 threads using dma0chan3
>> [ 84.051003] dmatest: Started 1 threads using dma0chan4
>> [ 84.056916] dmatest: Started 1 threads using dma0chan5
>> [ 84.062828] dmatest: Started 1 threads using dma0chan6
>> [ 84.068714] dmatest: Started 1 threads using dma0chan7
>> [ 86.538284] dma0chan0-copy0: terminating after 50 tests, 0 failures (status 0)
>> [ 86.842221] dma0chan1-copy0: terminating after 50 tests, 0 failures (status 0)
>> [ 87.060460] dma0chan6-copy0: #0: test timed out
>> [ 87.065614] dma0chan7-copy0: #0: test timed out
>> [ 87.220321] dma0chan2-copy0: terminating after 50 tests, 0 failures (status 0)
>> [ 88.595061] dma0chan3-copy0: terminating after 50 tests, 0 failures (status 0)
>> [ 89.152170] dma0chan4-copy0: terminating after 50 tests, 0 failures (status 0)
>> [ 89.955059] dma0chan5-copy0: terminating after 50 tests, 0 failures (status 0)
>> [ 90.697073] dma0chan6-copy0: terminating after 50 tests, 1 failures (status 0)
>> [ 90.893422] dma0chan7-copy0: terminating after 50 tests, 1 failures (status 0)
>
> You still have failures. :(
Sure, the point is we have no 'in progress' issues

> Can you try with a large timeout value for the module.
I tried and the failures were gone.

> We must get to the root cause of these failures. There may be something more
> serious which is getting hidden due to this call to terminate().
My understanding is that. The software LLP emulation runs several
transactions per active descriptor. Because of a huge load of the
CPU/DMA some transactions are not done within given timeout. The
dmatest supplies next block to transfer without doing anything for
previous one. Under some circumstances the new transfer is queued, and
immediately after this the callback function is called for _previous_
transfer. The check condition doesn't recognize which transfer called
the callback function.

Rough solution is proposed by current patch. Another solution is to
mark each transfer with id and check done flag and transfer id
together.

> Unless there is a issue with software emulation of LLP, the only difference with
> s/w emulation is the transfers become slow.
Yep.

> Also, the proposed solution might hide some other important errors. We may need
> to terminate transfers when we found that an error is there in last transfers:
I think it could be better than first solution, but what do you think
about marking each transfer with corresponding id?


--
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/