Re: [PATCH 02/13] dmaengine: edma: Optimize memcpy operation

From: Vinod Koul
Date: Wed Oct 14 2015 - 23:56:22 EST


On Wed, Oct 14, 2015 at 06:02:18PM +0300, Peter Ujfalusi wrote:
> On 10/14/2015 05:41 PM, Vinod Koul wrote:
> > On Wed, Oct 14, 2015 at 04:12:13PM +0300, Peter Ujfalusi wrote:
> >> @@ -1320,41 +1317,92 @@ static struct dma_async_tx_descriptor *edma_prep_dma_memcpy(
> >> struct dma_chan *chan, dma_addr_t dest, dma_addr_t src,
> >> size_t len, unsigned long tx_flags)
> >> {
> >> - int ret;
> >> + int ret, nslots;
> >> struct edma_desc *edesc;
> >> struct device *dev = chan->device->dev;
> >> struct edma_chan *echan = to_edma_chan(chan);
> >> - unsigned int width;
> >> + unsigned int width, pset_len;
> >>
> >> if (unlikely(!echan || !len))
> >> return NULL;
> >>
> >> - edesc = kzalloc(sizeof(*edesc) + sizeof(edesc->pset[0]), GFP_ATOMIC);
> >> + if (len < SZ_64K) {
> >> + /*
> >> + * Transfer size less than 64K can be handled with one paRAM
> >> + * slot. ACNT = length
> >> + */
> >> + width = len;
> >> + pset_len = len;
> >> + nslots = 1;
> >> + } else {
> >> + /*
> >> + * Transfer size bigger than 64K will be handled with maximum of
> >> + * two paRAM slots.
> >> + * slot1: ACNT = 32767, length1: (length / 32767)
> >> + * slot2: the remaining amount of data.
> >> + */
> >> + width = SZ_32K - 1;
> >> + pset_len = rounddown(len, width);
> >> + /* One slot is enough for lengths multiple of (SZ_32K -1) */
> >
> > Hmm so does this mean if I have 140K transfer, it will do two 64K for 1st
> > slot and 12K in second slot ?
>
> Not exactly. If the size is less than 64K it can be done with one 'burst' but
> if it is bigger we need to have two sets of transfer:
> 1. 32K blocks
> 2. the remaining data
>
> so in case of 140K:
> 4 x 32K followed by 12K

Okay this part wasn't very clear to me, can you please add some comment
explaining this bit

>
> >
> > Is there a limit on 'blocks' of 64K we can do here?
>
> 32767 32K blocks is the limit.
>
> The 64K burst is only possible if the whole transfer is less less than 64K.
> With the ACNT counter we can transfer 64K - 1 bytes, but if this is not enough
> we need to use the BCNT counter and for that to work the the distance between
> the start of 'slot n' and the start of 'slot n+1' need to be less than 32K,
> this is the reason why we have 32K 'blocks' to transfer first followed by the
> remaining.

Okay IIUC, we have option to single burst if its less that 64K using one
slot, otherwise split to 32K chunk with 2 slots, or would it be N in that
case

Really need more documentation here :)
--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/