Hi Loic,i think i spoke too fast. I think we dont need to worry about the polling mode as the context_wp update would happen at some point of time and that does not require dma_wmb after update context wp.
On 5/4/2022 8:58 AM, Manivannan Sadhasivam wrote:
On Wed, May 04, 2022 at 11:25:33AM +0200, Loic Poulain wrote:
On Wed, 4 May 2022 at 10:17, Manivannan SadhasivamHmm. Since I was reading the memory model and going through the MHI code, I
<manivannan.sadhasivam@xxxxxxxxxx> wrote:
Hi Loic,Interesting point, but shouldn't it be more correct to translate it as:
On Wed, May 04, 2022 at 09:21:20AM +0200, Loic Poulain wrote:
Hi Mani,Good point. I think my statement was misleading. But still this scenario won't
On Mon, 2 May 2022 at 12:42, Manivannan Sadhasivam
<manivannan.sadhasivam@xxxxxxxxxx> wrote:
The endpoint device will only read the context wp when the host ringsAre we sure about this statement? what if we update ctxt_wp while the
the doorbell.
device is still processing the previous ring? is it going to continue
processing the new ctxt_wp or wait for a new doorbell interrupt? what
about burst mode in which we don't ring at all (ring_db is no-op)?
happen as per my undestanding. Please see below.
I see a clear data dependency between writing the ring element and updating theAnd moreover the doorbell write is using writel(). ThisYes but the barrier is to ensure that descriptor/ring content is
guarantess that the prior writes will be completed before ringing
doorbell.
updated before we actually pass it to device ownership, it's not about
ordering with the doorbell write, but the memory coherent ones.
context pointer. For instance,
```
struct mhi_ring_element *mhi_tre;
mhi_tre = ring->wp;
/* Populate mhi_tre */
...
/* Increment wp */
ring->wp += el_size;
/* Update ctx wp */
ring->ctx_wp = ring->iommu_base + (ring->wp - ring->base);
```
This is analogous to:
```
Read PTR A;
Update PTR A;
Increment PTR A;
Write PTR A to PTR B;
```
1. Write PTR A to PTR B (mhi_tre);
2. Update PTR B DATA;
3. Increment PTR A;
4. Write PTR A to PTR C;
In that case, it looks like line 2. has no ordering constraint with 3.
& 4? whereas the following guarantee it:
1. Write PTR A to PTR B (mhi_tre);
2. Update PTR B DATA;
3. Increment PTR A;
dma_wmb()
4. Write PTR A to PTR C;
To be honest, compiler optimization is beyond my knowledge, so I don't
know if a specific compiler arch/version could be able to mess up the
sequence or not. But this pattern is really close to what is described
for dma_wmb() usage in Documentation/memory-barriers.txt. That's why I
challenged this change and would be conservative, keeping the explicit
barrier.
_thought_ that this dma_wmb() is redundant. But I missed the fact that the
updating to memory pointed by "wp" happens implicitly via a pointer. So that
won't qualify as a direct dependency.
I think the barrier makes sense now. Sorry for the confusion and thanks for theHere, because of the data dependency due to "ring->wp", the CPU or compilerYou may be right here about the implicit ordering guarantee... So if
won't be ordering the instructions. I think that's one of the reason we never
hit any issue due to this.
you're sure, I think it would deserve an inline comment to explain why
we don't need a memory barrier as in the 'usual' dma descriptor update
sequences.
explanations.
Thanks,
Mani
Loic
You made a good point. After following your conversation, in case of burst mode is enabled and currently
we are in polling mode, does it make sense to move dma_wmb after updating channel WP context ?
DB ring is going to get skipped when we are in pilling mode.
instead of dma_wmb();
*ring->ctxt_wp = cpu_to_le64(db);
*ring->ctxt_wp = cpu_to_le64(db); dma_wmb();
Thanks,
Hemant