Re: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3

From: Robin Murphy
Date: Thu Jul 26 2018 - 10:16:53 EST


On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:


On 2018/7/25 5:51, Robin Murphy wrote:
On 2018-07-12 7:18 AM, Zhen Lei wrote:
v2 -> v3: Add a bootup option "iommu_strict_mode" to make the
manager can choose which mode to be used. The first 5 patches
have not changed. + iommu_strict_mode= [arm-smmu-v3] +
0 - strict mode (default) + 1 - non-strict mode

v1 -> v2: Use the lowest bit of the io_pgtable_ops.unmap's iova
parameter to pass the strict mode: 0, IOMMU_STRICT; 1,
IOMMU_NON_STRICT; Treat 0 as IOMMU_STRICT, so that the unmap
operation can compatible with other IOMMUs which still use strict
mode. In other words, this patch series will not impact other
IOMMU drivers. I tried add a new quirk
IO_PGTABLE_QUIRK_NON_STRICT in io_pgtable_cfg.quirks, but it can
not pass the strict mode of the domain from SMMUv3 driver to
io-pgtable module.

What exactly is the issue there? We don't have any problem with
other quirks like NO_DMA, and as I said before, by the time we're
allocating the io-pgtable in arm_smmu_domain_finalise() we already
know everything there is to know about the domain.

Because userspace can map/unamp and start devices to access memory
through VFIO. So that, the attacker can: 1. alloc memory 2. map 3.
unmap 4. free memory 5. repeatedly accesssing the just freed memory
base on the just unmapped iova, this attack may success if the freed
memory is reused by others and the mapping still staying in TLB

Right, but that's why we don't set non-strict mode on unmanaged domains; what I was asking about was specifically why "it can not pass the strict mode of the domain from SMMUv3 driver to io-pgtable module", because we don't get anywhere near io-pgtable until we already know whether the domain in question can allow lazy unmaps or not.

But if only root user can use VFIO, this is an unnecessary worry.
Then both normal and VFIO will use the same strict mode, so that the
new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied.


Add a new member domain_non_strict in struct iommu_dma_cookie,
this member will only be initialized when the related domain and
IOMMU driver support non-strict mode.

v1: In common, a IOMMU unmap operation follow the below steps: 1.
remove the mapping in page table of the specified iova range 2.
execute tlbi command to invalid the mapping which is cached in
TLB 3. wait for the above tlbi operation to be finished 4. free
the IOVA resource 5. free the physical memory resource

This maybe a problem when unmap is very frequently, the
combination of tlbi and wait operation will consume a lot of
time. A feasible method is put off tlbi and iova-free operation,
when accumulating to a certain number or reaching a specified
time, execute only one tlbi_all command to clean up TLB, then
free the backup IOVAs. Mark as non-strict mode.

But it must be noted that, although the mapping has already been
removed in the page table, it maybe still exist in TLB. And the
freed physical memory may also be reused for others. So a
attacker can persistent access to memory based on the just freed
IOVA, to obtain sensible data or corrupt memory. So the VFIO
should always choose the strict mode.

Some may consider put off physical memory free also, that will
still follow strict mode. But for the map_sg cases, the memory
allocation is not controlled by IOMMU APIs, so it is not
enforceable.

Fortunately, Intel and AMD have already applied the non-strict
mode, and put queue_iova() operation into the common file
dma-iommu.c., and my work is based on it. The difference is that
arm-smmu-v3 driver will call IOMMU common APIs to unmap, but
Intel and AMD IOMMU drivers are not.

Below is the performance data of strict vs non-strict for NVMe
device: Randomly Read IOPS: 146K(strict) vs 573K(non-strict) Randomly Write IOPS: 143K(strict) vs 513K(non-strict)

How does that compare to passthrough performance? One thing I'm not
entirely clear about is what the realistic use-case for this is -
even if invalidation were infinitely fast, enabling translation
still typically has a fair impact on overall system performance in
terms of latency, power, memory bandwidth, etc., so I can't help
wonder what devices exist today for which performance is critical
and robustness* is unimportant, yet have crippled addressing
capabilities such that they can't just use passthrough.
I have no passthrough performance data yet, I will ask my team to do
it. But we have tested the Global bypass: Randomly Read IOPS: 744K,
and Randomly Write IOPS: is the same to non-strict.

I'm also not clear. But I think in most cases, the system does not
need to run at full capacity, but the system should have that
ability. For example, a system's daily load may only 30-50%, but the
load may increase to 80%+ on festival day.

Passthrough is not enough to support VFIO, and virtualization need
the later.

Huh? The whole point of passthrough mode is that the IOMMU can still be used for VFIO, but without imposing the overhead of dynamic mapping on host DMA.

Robin.

* I don't want to say "security" here, since I'm actually a lot
less concerned about the theoretical malicious endpoint/wild write
scenarios than the the much more straightforward malfunctioning
device and/or buggy driver causing use-after-free style memory
corruption. Also, I'm sick of the word "security"...

OKïWe really have no need to consider buggy devices.



Zhen Lei (6): iommu/arm-smmu-v3: fix the implementation of
flush_iotlb_all hook iommu/dma: add support for non-strict mode iommu/amd: use default branch to deal with all non-supported capabilities iommu/io-pgtable-arm: add support for non-strict
mode iommu/arm-smmu-v3: add support for non-strict mode iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ drivers/iommu/amd_iommu.c | 4 +-- drivers/iommu/arm-smmu-v3.c | 42
+++++++++++++++++++++++-- drivers/iommu/dma-iommu.c
| 25 +++++++++++++++ drivers/iommu/io-pgtable-arm.c
| 23 ++++++++------ include/linux/iommu.h
| 7 +++++ 6 files changed, 98 insertions(+), 15 deletions(-)


.