Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

From: Ethan Zhao
Date: Mon Aug 11 2025 - 21:17:37 EST




On 8/11/2025 9:55 PM, Dave Hansen wrote:
On 8/11/25 02:15, Uladzislau Rezki wrote:
kernel_pte_work.list is global shared var, it would make the producer
pte_free_kernel() and the consumer kernel_pte_work_func() to operate in
serialized timing. In a large system, I don't think you design this
deliberately 🙂

Sorry for jumping.

Agree, unless it is never considered as a hot path or something that can
be really contented. It looks like you can use just a per-cpu llist to drain
thinks.

Remember, the code that has to run just before all this sent an IPI to
every single CPU on the system to have them do a (on x86 at least)
pretty expensive TLB flush.

It can be easily identified as a bottleneck by multi-CPU stress testing programs involving frequent process creation and destruction, similar to the operation of a heavily loaded multi-process Apache web server. Hot/cold path ?

If this is a hot path, we have bigger problems on our hands: the full
TLB flush on every CPU.
Perhaps not "WE", IPI driven TLB flush seems not the shared mechanism of
all CPUs, at least not for ARM as far as I know.


So, sure, there are a million ways to make this deferred freeing more
scalable. But the code that's here is dirt simple and self contained. If
someone has some ideas for something that's simpler and more scalable,
then I'm totally open to it.

But this is _not_ the place to add complexity to get scalability.
At least, please dont add bottleneck, how complex to do that ?

Thanks,
Ethan