Hi Baolu,
-----Original Message-----
From: Lu Baolu [mailto:baolu.lu@xxxxxxxxxxxxxxx]
Sent: Thursday, April 8, 2021 12:32 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
<longpeng2@xxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx;
linux-kernel@xxxxxxxxxxxxxxx
Cc: baolu.lu@xxxxxxxxxxxxxxx; David Woodhouse <dwmw2@xxxxxxxxxxxxx>; Nadav
Amit <nadav.amit@xxxxxxxxx>; Alex Williamson <alex.williamson@xxxxxxxxxx>;
Kevin Tian <kevin.tian@xxxxxxxxx>; Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>;
stable@xxxxxxxxxxxxxxx
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage
Hi Longpeng,
On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product
Dept.) wrote:
Hi Baolu,orNOT-present ( e.g. create a totally new superpage mapping ), but we only need to
-----Original Message-----
From: Lu Baolu [mailto:baolu.lu@xxxxxxxxxxxxxxx]
Sent: Friday, April 2, 2021 12:44 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
<longpeng2@xxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx;
linux-kernel@xxxxxxxxxxxxxxx
Cc: baolu.lu@xxxxxxxxxxxxxxx; David Woodhouse <dwmw2@xxxxxxxxxxxxx>;
Nadav Amit <nadav.amit@xxxxxxxxx>; Alex Williamson
<alex.williamson@xxxxxxxxxx>; Kevin Tian <kevin.tian@xxxxxxxxx>;
Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>; stable@xxxxxxxxxxxxxxx
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating
superpage
Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
diff --git a/drivers/iommu/intel/iommu.cdmar_domain *domain,
b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int
hardware_largepage_caps(struct
* removed to make room for superpage(s).
* We're adding new large pages, so make sure
* we don't remove their parent tables.
+ *
+ * We also need to flush the iotlb before creating
+ * superpage to ensure it does not perserves any
+ * obsolete info.
*/
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ if (dma_pte_present(pte)) {
The dma_pte_free_pagetable() clears a batch of PTEs. So checking
current PTE is insufficient. How about removing this check and always
performing cache invalidation?
Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )
call free_pagetable and flush_iotlb in the former case, right ?
But this code covers multiple PTEs and perhaps crosses the page boundary.
How about moving this code into a separated function and check PTE presence
there. A sample code could look like below: [compiled but not tested!]
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index
d334f5b4e382..0e04d450c38a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
return level;
}
+/*
+ * Ensure that old small page tables are removed to make room for
superpage(s).
+ * We're going to add new large pages, so make sure we don't remove
their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+ unsigned long start_pfn,
+ unsigned long end_pfn, int level) {
Maybe "swith_to" will lead people to think "remove old and then setup new", so how about something like "remove_room_for_super_page" or "prepare_for_super_page" ?
+ unsigned long lvl_pages = lvl_to_nr_pages(level);
+ struct dma_pte *pte = NULL;
+ int i;
+
+ while (start_pfn <= end_pfn) {
start_pfn < end_pfn ?
+ if (!pte)
+ pte = pfn_to_dma_pte(domain, start_pfn, &level);
+
+ if (dma_pte_present(pte)) {
+ dma_pte_free_pagetable(domain, start_pfn,
+ start_pfn + lvl_pages - 1,
+ level + 1);
+
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i],
domain,
+ start_pfn,
lvl_pages,
+ 0, 0);
+ }
+
+ pte++;
+ start_pfn += lvl_pages;
+ if (first_pte_in_page(pte))
+ pte = NULL;
+ }
+}
+
static int
__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
unsigned long phys_pfn, unsigned long nr_pages, int prot)
@@ -2341,22 +2376,11 @@ __domain_mapping(struct dmar_domain *domain,
unsigned long iov_pfn,
return -ENOMEM;
/* It is large page*/
if (largepage_lvl > 1) {
- unsigned long nr_superpages, end_pfn;
+ unsigned long end_pfn;
pteval |= DMA_PTE_LARGE_PAGE;
- lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
- nr_superpages = nr_pages / lvl_pages;
- end_pfn = iov_pfn + nr_superpages *
lvl_pages - 1;
-
- /*
- * Ensure that old small page tables are
- * removed to make room for superpage(s).
- * We're adding new large pages, so make
sure
- * we don't remove their parent tables.
- */
- dma_pte_free_pagetable(domain, iov_pfn,
end_pfn,
- largepage_lvl +
1);
+ end_pfn = ((iov_pfn + nr_pages) &
level_mask(largepage_lvl)) - 1;
+ switch_to_super_page(domain, iov_pfn,
end_pfn, largepage_lvl);
} else {
pteval &=
~(uint64_t)DMA_PTE_LARGE_PAGE;
}
I will send you the diff patch off list. Any thoughts?
The solution looks good to me.
It's free for you to send this patch if you want, I'll send V2 if you have no plan to send it :)
Best regards,
baolu
+ int i;
+
+ dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
+ largepage_lvl + 1);
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ iov_pfn, nr_pages, 0, 0);
+
Best regards,
baolu