Re: [PATCH v3 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

From: Nadav Amit
Date: Thu Sep 15 2022 - 10:31:54 EST




> On Sep 14, 2022, at 11:42 PM, Barry Song <21cnbao@xxxxxxxxx> wrote:
>
>>
>> The very idea behind TLB deferral is the opportunity it (might) provide
>> to accumulate address ranges and cpu masks so that individual TLB flush
>> can be replaced with a more cost effective range based TLB flush. Hence
>> I guess unless address range or cpumask based cost effective TLB flush
>> is available, deferral does not improve the unmap performance as much.
>
>
> After sending tlbi, if we wait for the completion of tlbi, we have to get Ack
> from all cpus in the system, tlbi is not scalable. The point here is that we
> avoid waiting for each individual TLBi. Alternatively, they are batched. If
> you read the benchmark in the commit log, you can find the great decline
> in the cost to swap out a page.

Just a minor correction: arch_tlbbatch_flush() does not collect ranges.
On x86 it only accumulate CPU mask.