Re: [PATCH RFC 11/14] arm64: Move the ASID allocator code in a separate file

From: qi.fuli@xxxxxxxxxxx
Date: Thu Jun 27 2019 - 05:50:05 EST



On 6/24/19 7:22 PM, Will Deacon wrote:
> On Mon, Jun 24, 2019 at 12:35:35AM +0800, Guo Ren wrote:
>> On Fri, Jun 21, 2019 at 10:16 PM Catalin Marinas
>> <catalin.marinas@xxxxxxx> wrote:
>>> On Wed, Jun 19, 2019 at 07:51:03PM +0800, Guo Ren wrote:
>>>> On Wed, Jun 19, 2019 at 4:54 PM Julien Grall <julien.grall@xxxxxxx> wrote:
>>>>> On 6/19/19 9:07 AM, Guo Ren wrote:
>>>>>> Move arm asid allocator code in a generic one is a agood idea, I've
>>>>>> made a patchset for C-SKY and test is on processing, See:
>>>>>> https://lore.kernel.org/linux-csky/1560930553-26502-1-git-send-email-guoren@xxxxxxxxxx/
>>>>>>
>>>>>> If you plan to seperate it into generic one, I could co-work with you.
>>>>> Was the ASID allocator work out of box on C-Sky?
>>>> Almost done, but one question:
>>>> arm64 remove the code in switch_mm:
>>>> cpumask_clear_cpu(cpu, mm_cpumask(prev));
>>>> cpumask_set_cpu(cpu, mm_cpumask(next));
>>>>
>>>> Why? Although arm64 cache operations could affect all harts with CTC
>>>> method of interconnect, I think we should keep these code for
>>>> primitive integrity in linux. Because cpu_bitmap is in mm_struct
>>>> instead of mm->context.
>>> We didn't have a use for this in the arm64 code, so no point in
>>> maintaining the mm_cpumask. On some arm32 systems (ARMv6) with no
>>> hardware broadcast of some TLB/cache operations, we use it to track
>>> where the task has run to issue IPI for TLB invalidation or some
>>> deferred I-cache invalidation.
>> The operation of set/clear mm_cpumask was removed in arm64 compared to
>> arm32. It seems no side effect on current arm64 system, but from
>> software meaning it's wrong.
>> I think we should keep mm_cpumask just like arm32.
> It was a while ago now, but I remember the atomic update of the mm_cpumask
> being quite expensive when I was profiling this stuff, so I removed it
> because we don't need it for arm64 (at least, it doesn't allow us to
> optimise our shootdowns in practice).

Hi Will,

I think mm_cpumask can be used for filtering the cpus that there are TBL
entries on.
The OS jitter can be reduced by invalidating TLB entries only on the
CPUs specified by mm_cpumask(mm).
As I mentioned in an earlier email, the 2.5% OS jitter can result in
over a factor of 20 slowdown for the same application [1].
Though it may be an extreme example, reducing the OS jitter has been an
issue in HPC environment.
I would like to avoid broadcast TLBI by using mm_cpumask on arm64, cloud
you please tell me more about the costs caused by updating mm_cpumask?

Here is my patch:
https://lkml.org/lkml/2019/6/17/703

[1] Ferreira, Kurt B., Patrick Bridges, and Ron Brightwell.
"Characterizing application sensitivity to OS interference using
kernel-level noise injection." Proceedings of the 2008 ACM/IEEE
conference on Supercomputing. IEEE Press, 2008.

Thanks,
QI Fuli

> I still think this is over-engineered for what you want on c-sky and making
> this code generic is a mistake.
>
> Will