Re: [PATCH] mm: Remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST

From: David Hildenbrand
Date: Wed Jun 11 2025 - 04:23:34 EST


On 11.06.25 10:03, Marek Szyprowski wrote:
On 11.06.2025 04:38, Alistair Popple wrote:
On Tue, Jun 10, 2025 at 06:18:09PM +0200, Marek Szyprowski wrote:
On 04.06.2025 05:21, Alistair Popple wrote:
The PFN_MAP flag is no longer used for anything, so remove it.
The PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been
used so also remove them. The last user of PFN_SPECIAL was removed
by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED
support").

Signed-off-by: Alistair Popple <apopple@xxxxxxxxxx>
Acked-by: David Hildenbrand <david@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
Cc: gerald.schaefer@xxxxxxxxxxxxx
Cc: dan.j.williams@xxxxxxxxx
Cc: jgg@xxxxxxxx
Cc: willy@xxxxxxxxxxxxx
Cc: david@xxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: nvdimm@xxxxxxxxxxxxxxx
Cc: jhubbard@xxxxxxxxxx
Cc: hch@xxxxxx
Cc: zhang.lyra@xxxxxxxxx
Cc: debug@xxxxxxxxxxxx
Cc: bjorn@xxxxxxxxxx
Cc: balbirs@xxxxxxxxxx
Cc: lorenzo.stoakes@xxxxxxxxxx
Cc: John@xxxxxxxxxx

---

Splitting this off from the rest of my series[1] as a separate clean-up
for consideration for the v6.16 merge window as suggested by Christoph.

[1] - https://lore.kernel.org/linux-mm/cover.541c2702181b7461b84f1a6967a3f0e823023fcc.1748500293.git-series.apopple@xxxxxxxxxx/
---
include/linux/pfn_t.h | 31 +++----------------------------
mm/memory.c | 2 --
tools/testing/nvdimm/test/iomap.c | 4 ----
3 files changed, 3 insertions(+), 34 deletions(-)
This patch landed in today's linux-next as commit 28be5676b4a3 ("mm:
remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST"). In my tests
I've noticed that it breaks operation of all RISC-V 64bit boards on my
test farm (VisionFive2, BananaPiF3 as well as QEMU's Virt machine). I've
isolated the changes responsible for this issue, see the inline comments
in the patch below. Here is an example of the issues observed in the
logs from those machines:
Thanks for the report. I'm really confused by this because this change should
just be removal of dead code - nothing sets any of the removed PFN_* flags
AFAICT.

I don't have access to any RISC-V hardwdare but you say this reproduces under
qemu - what do you run on the system to cause the error? Is it just a simple
boot and load a module or are you running selftests or something else?

It fails a simple boot test. Here is a detailed instruction how to
reproduce this issue with the random Debian rootfs image found on the
internet (tested on Ubuntu 22.04, with next-20250610
kernel source):

riscv is one of the archs where pte_mkdevmap() will *not* set the pte as special. (I
raised this recently in the original series, it's all a big mess)

So, before this change here, pfn_t_devmap() would have returned "false" if only
PFN_DEV was set, now it would return "true" if only PFN_DEV is set.

Consequently, in insert_pfn() we would have done a pte_mkspecial(), now we do a
pte_mkdevmap() -- again, which does not imply "special" on riscv.

riscv selects CONFIG_ARCH_HAS_PTE_SPECIAL, so if !pte_special(), it's considered as
normal.

Would the following fix your issue?


diff --git a/mm/memory.c b/mm/memory.c
index 8eba595056fe3..0e972c3493692 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -589,6 +589,10 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
{
unsigned long pfn = pte_pfn(pte);
+ /* TODO: remove this crap and set pte_special() instead. */
+ if (pte_devmap(pte))
+ return NULL;
+
if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
if (likely(!pte_special(pte)))
goto check_pfn;
@@ -598,16 +602,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
return NULL;
if (is_zero_pfn(pfn))
return NULL;
- if (pte_devmap(pte))
- /*
- * NOTE: New users of ZONE_DEVICE will not set pte_devmap()
- * and will have refcounts incremented on their struct pages
- * when they are inserted into PTEs, thus they are safe to
- * return here. Legacy ZONE_DEVICE pages that set pte_devmap()
- * do not have refcounts. Example of legacy ZONE_DEVICE is
- * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers.
- */
- return NULL;
print_bad_pte(vma, addr, pte, NULL);
return NULL;


But, I would have thought the later patches in Alistairs series would sort that out
(where we remove pte_devmap() ... )

--
Cheers,

David / dhildenb