Re: [PATCH -v2 2/2] arm64, tlbflush: don't TLBI broadcast if page reused in write fault

From: Huang, Ying

Date: Wed Oct 22 2025 - 05:46:53 EST


Barry Song <21cnbao@xxxxxxxxx> writes:

>>
>> With PTL, this becomes
>>
>> CPU0: CPU1:
>>
>> page fault page fault
>> lock PTL
>> write PTE
>> do local tlbi
>> unlock PTL
>> lock PTL <- pte visible to CPU 1
>> read PTE <- new PTE
>> do local tlbi <- new PTE
>> unlock PTL
>
> I agree. Yet the ish barrier can still avoid the page faults during CPU0's PTL.

IIUC, you think that dsb(ish) compared with dsb(nsh) can accelerate
memory writing (visible to other CPUs). TBH, I suspect that this is the
case.

> CPU0: CPU1:
>
> lock PTL
>
> write pte;
> Issue ish barrier
> do local tlbi;
>
>
> No page fault occurs if tlb misses
>
>
> unlock PTL
>
>
> Otherwise, it could be:
>
>
> CPU0: CPU1:
>
> lock PTL
>
> write pte;
> Issue nsh barrier
> do local tlbi;
>
>
> page fault occurs if tlb misses
>
>
> unlock PTL
>
>
> Not quite sure if adding an ish right after the PTE modification has any
> noticeable performance impact on the test? I assume the most expensive part
> is still the tlbi broadcast dsb, not the PTE memory sync barrier?

---
Best Regards,
Huang, Ying