Re: [PATCH] cache: Workaround HiSilicon Taishan DC CVAU

From: chenweilong
Date: Tue Dec 28 2021 - 22:12:02 EST


On 2021/12/14 2:56, Will Deacon wrote:
> On Fri, Nov 26, 2021 at 05:11:39PM +0800, Weilong Chen wrote:
>> Taishan's L1/L2 cache is inclusive, and the data is consistent.
>> Any change of L1 does not require DC operation to brush CL in L1 to L2.
>> It's safe that don't clean data cache by address to point of unification.
>>
>> Without IDC featrue, kernel needs to flush icache as well as dcache,
>> causes performance degradation.
>>
>> The flaw refers to V110/V200 variant 1.
>>
>> Signed-off-by: Weilong Chen <chenweilong@xxxxxxxxxx>
>> ---
>> Documentation/arm64/silicon-errata.rst | 2 ++
>> arch/arm64/Kconfig | 11 +++++++++
>> arch/arm64/include/asm/cputype.h | 2 ++
>> arch/arm64/kernel/cpu_errata.c | 32 ++++++++++++++++++++++++++
>> arch/arm64/tools/cpucaps | 1 +
>> 5 files changed, 48 insertions(+)
> Hmm. We don't usually apply optimisations for specific CPUs on arm64, simply
> because the diversity of CPUs out there means it quickly becomes a
> fragmented mess.
>
> Is this patch purely a performance improvement? If so, please can you
> provide some numbers in an attempt to justify it?

Yes,it's a performance improvement. I have a test program like this:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/time.h>

int main()
{
        void *tmp;
        int len = 200 * 1024 * 1024;
        struct timeval start, end;
        int interval;
        tmp = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        if(tmp == MAP_FAILED) {
                perror("mmap failed");
                exit(errno);
        }
        memset(tmp, 0, len);

        gettimeofday(&start, NULL);
        if(mprotect(tmp, len, PROT_READ|PROT_EXEC)) {
                perror("Couldn’t mprotect");
                exit(errno);
        }
        gettimeofday(&end, NULL);
        interval = 1000000*(end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec);
        printf("interval = %fms\n", interval/1000.0);
}

Without this fix, the mprotect takes:

interval = 25.608000ms

And with this fix:

interval = 0.689000ms

Have better performance improvement.

If you think it is suitable, I will send a v2 patch as the original patch broken cpu hotplug checks.

>
> Thanks,
>
> Will
> .