Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction

From: Leizhen (ThunderTown)
Date: Tue Aug 22 2017 - 21:22:30 EST




On 2017/8/22 23:41, Joerg Roedel wrote:
> On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote:
>> -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
>> +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
>> {
>> if (queue_full(q))
>> return -ENOSPC;
>>
>> queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
>> - queue_inc_prod(q);
>> +
>> + /*
>> + * We don't want too many commands to be delayed, this may lead the
>> + * followed sync command to wait for a long time.
>> + */
>> + if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
>> + queue_inc_swprod(q);
>> + } else {
>> + queue_inc_prod(q);
>> + q->nr_delay = 0;
>> + }
>> +
>> return 0;
>> }
>>
>> @@ -909,6 +928,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>> static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>> struct arm_smmu_cmdq_ent *ent)
>> {
>> + int optimize = 0;
>> u64 cmd[CMDQ_ENT_DWORDS];
>> unsigned long flags;
>> bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
>> @@ -920,8 +940,17 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>> return;
>> }
>>
>> + /*
>> + * All TLBI commands should be followed by a sync command later.
>> + * The CFGI commands is the same, but they are rarely executed.
>> + * So just optimize TLBI commands now, to reduce the "if" judgement.
>> + */
>> + if ((ent->opcode >= CMDQ_OP_TLBI_NH_ALL) &&
>> + (ent->opcode <= CMDQ_OP_TLBI_NSNH_ALL))
>> + optimize = 1;
>> +
>> spin_lock_irqsave(&smmu->cmdq.lock, flags);
>> - while (queue_insert_raw(q, cmd) == -ENOSPC) {
>> + while (queue_insert_raw(q, cmd, optimize) == -ENOSPC) {
>> if (queue_poll_cons(q, false, wfe))
>> dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>> }
>
> This doesn't look correct. How do you make sure that a given IOVA range
> is flushed before the addresses are reused?
Hi, Joerg:
It's actullay guaranteed by the upper layer functions, for example:
static int arm_lpae_unmap(
...
unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep); //__arm_lpae_unmap will indirectly call arm_smmu_cmdq_issue_cmd to invalidate tlbs
if (unmapped)
io_pgtable_tlb_sync(&data->iop); //a tlb_sync wait all tlbi operations finished


I also described it in the next patch(2/5). Showed below:

Some people might ask: Is it safe to do so? The answer is yes. The standard
processing flow is:
alloc iova
map
process data
unmap
tlb invalidation and sync
free iova

What should be guaranteed is: "free iova" action is behind "unmap" and "tlbi
operation" action, that is what we are doing right now. This ensures that:
all TLBs of an iova-range have been invalidated before the iova reallocated.

Best regards,
LeiZhen

>
>
> Regards,
>
> Joerg
>
>
> .
>

--
Thanks!
BestRegards