RE: [PATCH 1/1] iommu/vt-d: Revert ATS timing change to fix boot failure
From: Tian, Kevin
Date: Wed Apr 16 2025 - 22:23:35 EST
> From: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Wednesday, April 16, 2025 3:36 PM
>
> Commit <5518f239aff1> ("iommu/vt-d: Move scalable mode ATS enablement
> to
> probe path") changed the PCI ATS enablement logic to run earlier,
> specifically before the default domain attachment.
>
> On some client platforms, this change resulted in boot failures, causing
> the kernel to panic with the following message and call trace:
>
> Kernel panic - not syncing: DMAR hardware is malfunctioning
> CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #175
> Call Trace:
> <TASK>
> dump_stack_lvl+0x6f/0xb0
> dump_stack+0x10/0x16
> panic+0x10a/0x2b7
> iommu_enable_translation.cold+0xc/0xc
> intel_iommu_init+0xe39/0xec0
> ? trace_hardirqs_on+0x1e/0xd0
> ? __pfx_pci_iommu_init+0x10/0x10
> pci_iommu_init+0xd/0x40
> do_one_initcall+0x5b/0x390
> kernel_init_freeable+0x26d/0x2b0
> ? __pfx_kernel_init+0x10/0x10
> kernel_init+0x15/0x120
> ret_from_fork+0x35/0x60
> ? __pfx_kernel_init+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> RIP: 1f0f:0x0
> Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX:
> 1f0f2e6600000000
> RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX:
> 2e66000000000084
> RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI:
> 00841f0f2e660000
> RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09:
> 000000841f0f2e66
> R10: 0000000000841f0f R11: 2e66000000000084 R12:
> 000000841f0f2e66
> R13: 0000000000841f0f R14: 2e66000000000084 R15:
> 1f0f2e6600000000
> </TASK>
> ---[ end Kernel panic - not syncing: DMAR hardware is malfunctioning ]---
>
> Fix this by reverting the timing change for ATS enablement introduced by
> the offending commit and restoring the previous behavior.
>
it's unclear how this timing is related to the dumped stack. Is there
more detail how they are related?