Re: [PATCH v4] arm64: Enable vmalloc-huge with ptdump

From: Catalin Marinas
Date: Fri Jul 04 2025 - 07:22:18 EST


On Thu, Jun 26, 2025 at 10:55:24AM +0530, Dev Jain wrote:
> @@ -1301,16 +1314,39 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
> }
>
> table = pmd_offset(pudp, addr);
> + /*
> + * Isolate the PMD table; in case of race with ptdump, this helps
> + * us to avoid taking the lock in __pmd_free_pte_page().
> + *
> + * Static key logic:
> + *
> + * Case 1: If ptdump does static_branch_enable(), and after that we
> + * execute the if block, then this patches in the read lock, ptdump has
> + * the write lock patched in, therefore ptdump will never read from
> + * a potentially freed PMD table.
> + *
> + * Case 2: If the if block starts executing before ptdump's
> + * static_branch_enable(), then no locking synchronization
> + * will be done. However, pud_clear() + the dsb() in
> + * __flush_tlb_kernel_pgtable will ensure that ptdump observes an

Statements like "observes" really need to be relative, not absolute.
Like in "observes an empty PUD before/after ...".

> + * empty PUD. Thus, it will never walk over a potentially freed
> + * PMD table.
> + */
> + pud_clear(pudp);
> + __flush_tlb_kernel_pgtable(addr);
> + if (static_branch_unlikely(&ptdump_lock_key)) {
> + mmap_read_lock(&init_mm);
> + mmap_read_unlock(&init_mm);
> + }

This needs a formal model ;).

static_branch_enable() is called before the mmap_write_lock(), so even
if the above observes the new branch, it may do the read_unlock() before
the ptdump_walk_pgd() attempted the write lock. So your case 1
description is not entirely correct.

I don't get case 2. You want to ensure pud_clear() is observed by the
ptdump code before the pmd_free(). The DSB in the TLB flushing code
ensures some ordering between the pud_clear() and presumably something
that the ptdump code can observe as well. Would that be the mmap
semaphore? However, the read_lock would only be attempted if this code
is seeing the static branch update, which is not guaranteed. I don't
think it even matters since the lock may be released anyway before the
write_lock in ptdump.

For example, you do a pud_clear() above, skip the whole static branch.
The ptdump comes along on another CPU but does not observe the
pud_clear() since there's no synchronisation. It goes ahead and
dereferences it while this CPU does a pmd_free().

And I can't get my head around memory ordering, it doesn't look sound.
static_branch_enable() I don't think has acquire semantics, at least not
in relation to the actual branch update. The static_branch_unlikely()
test, again, not sure what guarantees it has (I don't think it has any
in relation to memory loads). Maybe you have worked it all out and is
fine but it needs a better explanation and ideally some simple formal
model. Show it's correct with a global variable first and then we can
optimise with static branches.

--
Catalin