Re: [PATCH 1/9] cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL

From: Christoph Lameter (Ampere)
Date: Fri May 03 2024 - 13:07:33 EST


On Thu, 2 May 2024, Ankur Arora wrote:

The intend was to make the processor aware that we are in a spin loop. Various
processors have different actions that they take upon encountering such a cpu
relax operation.

Sure, though most processors don't have a nice mechanism to do that.
x86 clearly has the REP; NOP thing. arm64 only has a YIELD which from my
measurements is basically a NOP when executed on a system without
hardware threads.

And that's why only x86 defines ARCH_HAS_CPU_RELAX.

My impression is that the use of arm YIELD has led cpu architects to implement similar mechanisms to x86s PAUSE, This is not part of the spec but it has been there for a long time. So I would rather leave it as is.


These are not the same and I think we need both config options.

My main concern is that poll_idle() conflates polling in idle with
ARCH_HAS_CPU_RELAX, when they aren't really related.

So, poll_idle(), and its users should depend on ARCH_HAS_OPTIMIZED_POLL
which, if defined by some architecture, means that poll_idle() would
be better than a spin-wait loop.

Beyond that I'm okay to keep ARCH_HAS_CPU_RELAX around.

That said, do you see a use for ARCH_HAS_CPU_RELAX? The only current
user is the poll-idle path.

I would think that we need a generic cpu_poll() mechanism that can fall back to cpu_relax() on processors that do not offer such thing (x86?) and if not even that is there fall back.

We already have something like that in the smp_cond_acquire mechanism (a bit weird to put that in the barrier.h>).

So what if we had

void cpu_wait(unsigned flags, unsigned long timeout, void *cacheline);

With

#define CPU_POLL_INTERRUPT (1 << 0)
#define CPU_POLL_EVENT (1 << 1)
#define CPU_POLL_CACHELINE (1 << 2)
#define CPU_POLL_TIMEOUT (1 << 3)
#define CPU_POLL_BROADCAST_EVENT (1 << 4)
#define CPU_POLL_LOCAL_EVENT (1 << 5)


The cpu_poll() function coud be generically defined in asm-generic and then arches could provide their own implementation optimizing the hardware polling mechanisms.

Any number of flags could be specified simultaneously. On ARM this would map then to SEVL SEV and WFI/WFE WFIT/WFET

So f.e.

cpu_wait(CPU_POLL_INTERUPT|CPU_POLL_EVENT|CPU_POLL_TIMEOUT|CPU_POLL_CACHELINE, timeout, &mylock);

to wait on a change in a cacheline with a timeout.

In additional we could then think about making effective use of the signaling mechanism provided by SEV in core logic of the kernel. Maybe that is more effective then waiting for a cacheline in some situations.


With WFE, sure there's a problem in that you depend on an interrupt or
the event-stream to get out of the wait. And, so sometimes you would
overshoot the target poll timeout.

Right. The dependence on the event stream makes this approach a bit strange. Having some sort of generic cpu_wait() feature with timeout spec could avoid that.