Re: [PATCH v2 1/1] iommu/vt-d: Fix kdump kernels boot failure with scalable mode

From: Baolu Lu
Date: Thu Aug 18 2022 - 07:13:29 EST


On 2022/8/18 16:32, Tian, Kevin wrote:
From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
Sent: Wednesday, August 17, 2022 9:11 AM

The translation table copying code for kdump kernels is currently based
on the extended root/context entry formats of ECS mode defined in older
VT-d v2.5, and doesn't handle the scalable mode formats. This causes
the kexec capture kernel boot failure with DMAR faults if the IOMMU was
enabled in scalable mode by the previous kernel.

The ECS mode has already been deprecated by the VT-d spec since v3.0 and
Intel IOMMU driver doesn't support this mode as there's no real hardware
implementation. Hence this converts ECS checking in copying table code
into scalable mode.

The existing copying code consumes a bit in the context entry as a mark
of copied entry. This marker needs to work for the old format as well as
for extended context entries. It's hard to find such a bit for both

The 2nd sentence "This marker..." is misleading. better removed.

Okay. I will make it like "It needs to work for ...".


legacy and scalable mode context entries. This replaces it with a per-
IOMMU bitmap.

Fixes: 7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID
support")
Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Jerry Snitselaar <jsnitsel@xxxxxxxxxx>
Tested-by: Wen Jin <wen.jin@xxxxxxxxx>
Signed-off-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
...
@@ -2735,8 +2693,8 @@ static int copy_translation_tables(struct
intel_iommu *iommu)
bool new_ext, ext;

rtaddr_reg = dmar_readq(iommu->reg + DMAR_RTADDR_REG);
- ext = !!(rtaddr_reg & DMA_RTADDR_RTT);
- new_ext = !!ecap_ecs(iommu->ecap);
+ ext = !!(rtaddr_reg & DMA_RTADDR_SMT);
+ new_ext = !!ecap_smts(iommu->ecap);

should be !!sm_supported()

Not really. The IOMMU was setup by the previous kernel. Here we just
check whether the scalable mode was enabled there.



/*
* The RTT bit can only be changed when translation is disabled,
@@ -2747,6 +2705,10 @@ static int copy_translation_tables(struct
intel_iommu *iommu)
if (new_ext != ext)
return -EINVAL;

+ iommu->copied_tables = bitmap_zalloc(BIT_ULL(16), GFP_KERNEL);
+ if (!iommu->copied_tables)
+ return -ENOMEM;
+
old_rt_phys = rtaddr_reg & VTD_PAGE_MASK;
if (!old_rt_phys)
return -EINVAL;

Out of curiosity. What is the rationale that we copy root table and
context tables but not pasid tables?

We only copy the context table and reconstruct it when the default
domain is attached. Before that, there's no need to reconstruct the
pasid table, hence it's safe to use the previous pasid tables.

Best regards,
baolu