Re: [PATCH] arm: flush: don't abuse pfn_valid() to check if pfn is in RAM

From: Robin Murphy
Date: Wed Jan 31 2024 - 16:24:25 EST


On 2024-01-31 7:00 pm, Russell King (Oracle) wrote:
On Wed, Jan 31, 2024 at 06:39:31PM +0000, Robin Murphy wrote:
On 31/01/2024 12:59 pm, Yongqiang Liu wrote:
@@ -292,7 +293,7 @@ void __sync_icache_dcache(pte_t pteval)
/* only flush non-aliasing VIPT caches for exec mappings */
return;
pfn = pte_pfn(pteval);
- if (!pfn_valid(pfn))
+ if (!memblock_is_map_memory(PFN_PHYS(pfn)))
return;
folio = page_folio(pfn_to_page(pfn));

Hmm, it's a bit odd in context, since pfn_valid() obviously pairs with this
pfn_to_page(), whereas it's not necessarily clear that
memblock_is_map_memory() implies pfn_valid().

However, in this case we're starting from a PTE - rather than going off to
do a slow scan of memblock to determine whether a round-trip through
page_address() is going to give back a mapped VA, can we not trivially
identify that from whether the PTE itself is valid?

Depends what you mean by "valid". If you're referring to pte_valid()
and L_PTE_VALID then no.

On 32-bit non-LPAE, the valid bit is the same as the present bit, and
needs to be set for the PTE to not fault. Any PTE that is mapping
something will be "valid" whether it is memory or not, whether it is
backed by a page or not.

pfn_valid() should be telling us whether the PFN is suitable to be
passed to pfn_to_page(), and if we have a situation where pfn_valid()
returns true, but pfn_to_page() returns an invalid page, then that in
itself is a bug that needs to be fixed and probably has far reaching
implications for the stability of the kernel.

Right, the problem here seems to be the opposite one, wherein we *do* often have a valid struct page for an address which is reserved and thus not mapped by the kernel, but seemingly we then take it down a path which assumes anything !PageHighmem() is lowmem and dereferences page_address() without looking.

However I realise I should have looked closer at the caller, and my idea is futile since the PTE here is for a userspace mapping, not a kernel VA, and is already pte_valid_user() && !pte_special(). Plus the fact that the stack trace indicates an mmap() path suggests it most likely is a legitimate mapping of some no-map carveout or MMIO region. Oh well. My first point still stands, though - I think at least a comment to clarify that assumption would be warranted.

Thanks,
Robin.