Re: [PATCH v3 2/3] mm: Change ghes code to allow poison of non-struct pfn
From: Ira Weiny
Date: Wed Oct 22 2025 - 11:01:44 EST
Shuai Xue wrote:
>
>
> 在 2025/10/22 01:19, Luck, Tony 写道:
> >>> pfn = PHYS_PFN(physical_addr);
> >>> - if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
> >>
> >> Tony,
> >>
> >> I'm not an SGX expert but does this break SGX by removing
> >> arch_is_platform_page()?
> >>
> >> See:
> >>
> >> 40e0e7843e23 ("x86/sgx: Add infrastructure to identify SGX EPC pages")
> >> Cc: Tony Luck <tony.luck@xxxxxxxxx>
> >>
> > Ira,
> >
> > I think this deletion makes the GHES code always call memory_failure()
> > instead of bailing out here on "bad" page frame numbers.
> >
> > That centralizes the checks for different types of memory into
> > memory_failure().
> >
> > -Tony
>
> Hi, Tony, Ankit and Ira,
>
> Finally, we're seeing other use cases that need to handle errors for
> non-struct page PFNs :)
>
> IMHO, non-struct page PFNs are common in production environments.
> Besides NVIDIA Grace GPU device memory, we also use reserved DRAM memory
> managed by a separate VMEM allocator.
Can you elaborate on this more?
Ira
>
> This VMEM allocator is designed
> for virtual machine memory allocation, significantly reducing kernel
> memory management overhead by minimizing page table maintenance.
>
> To enable hardware error isolation for these memory pages, we've already
> removed this sanity check internally. This change makes memory_failure()
> the central point for handling all memory types, which is a much cleaner
> architecture.
>
> Reviewed-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
>
> Thanks.
> Shuai