Re: [PATCH v2 1/2] mm: Allow lockless kernel pagetable walking

From: David Hildenbrand
Date: Tue Jun 10 2025 - 09:44:40 EST


On 10.06.25 15:35, Lorenzo Stoakes wrote:
On Tue, Jun 10, 2025 at 03:31:56PM +0200, David Hildenbrand wrote:
On 10.06.25 15:27, Lorenzo Stoakes wrote:
On Tue, Jun 10, 2025 at 03:24:16PM +0200, David Hildenbrand wrote:
On 10.06.25 14:07, Lorenzo Stoakes wrote:
OK so I think the best solution here is to just update check_ops_valid(), which
was kind of sucky anyway (we check everywhere but walk_page_range_mm() to
enforce the install pte thing).

Let's do something like:

#define OPS_MAY_INSTALL_PTE (1<<0)
#define OPS_MAY_AVOID_LOCK (1<<1)

and update check_ops_valid() to take a flags or maybe 'capabilities' field.

Then check based on this e.g.:

if (ops->install_pte && !(capabilities & OPS_MAY_INSTALL_PTE))
return false;

if (ops->walk_lock == PGWALK_NOLOCK && !(capabilities & OPS_MAY_AVOID_LOCK))
return false;


Hm. I mean, we really only want to allow this lockless check for
walk_kernel_page_table_range(), right?

Having a walk_kernel_page_table_range_lockeless() might (or might not) be
better, to really only special-case this specific path.

Agree completely, Dev - let's definitely do this.


So, I am wondering if we should further start splitting the
kernel-page-table walker up from the mm walker, at least on the "entry"
function for now.

How do you mean?

In particular, "struct mm_walk_ops"

does not quite make sense when not actually walking a "real" mm .

So maybe we should start having a separate structure where *vma,
install_pte, walk_lock, hugetlb* does not even exist.

It might be a bit of churn, though ... not sure if there could be an easy
translation layer for now.

But you know... I looove churn right? <3 <3 <3 :)))

That's a nice idea, but I think something that should be a follow up.

Yes, absolutely, just wanted to raise it :)

--
Cheers,

David / dhildenb