Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

From: Uladzislau Rezki
Date: Mon Aug 11 2025 - 10:56:27 EST


On Mon, Aug 11, 2025 at 06:55:52AM -0700, Dave Hansen wrote:
> On 8/11/25 02:15, Uladzislau Rezki wrote:
> >> kernel_pte_work.list is global shared var, it would make the producer
> >> pte_free_kernel() and the consumer kernel_pte_work_func() to operate in
> >> serialized timing. In a large system, I don't think you design this
> >> deliberately 🙂
> >>
> > Sorry for jumping.
> >
> > Agree, unless it is never considered as a hot path or something that can
> > be really contented. It looks like you can use just a per-cpu llist to drain
> > thinks.
>
> Remember, the code that has to run just before all this sent an IPI to
> every single CPU on the system to have them do a (on x86 at least)
> pretty expensive TLB flush.
>
> If this is a hot path, we have bigger problems on our hands: the full
> TLB flush on every CPU.
>
> So, sure, there are a million ways to make this deferred freeing more
> scalable. But the code that's here is dirt simple and self contained. If
> someone has some ideas for something that's simpler and more scalable,
> then I'm totally open to it.
>
You could also have a look toward removing the &kernel_pte_work.lock.
Replace it by llist_add() on adding side and llist_for_each_safe(n, t, llist_del_all(&list))
on removing side. So you do not need guard(spinlock) stuff.

If i do not miss anything.

>
> But this is _not_ the place to add complexity to get scalability.
>
OK.

--
Uladzislau Rezki