Re: [PATCH] arm64: implement raw_smp_processor_id() using thread_info

From: Christoph Lameter (Ampere)
Date: Wed May 01 2024 - 12:24:07 EST


On Wed, 1 May 2024, Puranjay Mohan wrote:

Dump of assembler code for function bpf_get_smp_processor_id:
0xffff8000802cd608 <+0>: nop
0xffff8000802cd60c <+4>: nop
0xffff8000802cd610 <+8>: adrp x0, 0xffff800082138000
0xffff8000802cd614 <+12>: mrs x1, tpidr_el1
0xffff8000802cd618 <+16>: add x0, x0, #0x8
0xffff8000802cd61c <+20>: ldrsw x0, [x0, x1]
0xffff8000802cd620 <+24>: ret

In general arm64 has inefficient per cpu variable access. On x86 it is possible to access the processor id via a segment register relative access with a single instruction.

Arm64 calculates the address of a percpu variable for each access. This result in inefficiencies because:

1. The address calculation is processor specific. Therefore preemption needs to be disabled during the calculation of the address and while it is in use.

2. Additional registers are used causing the compiler to potentially generate less efficient code.

3. Even RMV instructions on percpu variables require the disabling of preemption due to the address calculation.

Russel King has a patchset for NUMA text replication and as part of that he introduces per cpu kernel page tables.

https://lwn.net/Articles/957023/

If we had per cpu page tables then we could create a mapping for a fixed address virtual memory range to the physical per cpu area for each cpu.

With that the address calculation would no longer be necessary for per cpu variable access and workarounds like this would not be necessary anymore.

The retrieval of the cpu id would be a single instruction that performs a load from a fixed virtual address. No preemption etc would be required.