RE: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

From: David Laight
Date: Wed Feb 08 2023 - 07:45:27 EST


From: Rob Herring
> Sent: 07 February 2023 17:06
>
> On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
> > From: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
> >
> > This key allows device trees to specify the performance of misaligned
> > accesses to main memory regions from each CPU in the system.
> >
> > Signed-off-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
> > Signed-off-by: Evan Green <evan@xxxxxxxxxxxx>
> > ---
> >
> > (no changes since v1)
> >
> > Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml
> b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > index c6720764e765..2c09bd6f2927 100644
> > --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
> > +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > @@ -85,6 +85,21 @@ properties:
> > $ref: "/schemas/types.yaml#/definitions/string"
> > pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
> >
> > + riscv,misaligned-access-performance:
> > + description:
> > + Identifies the performance of misaligned memory accesses to main memory
> > + regions. There are three flavors of unaligned access performance: "emulated"
> > + means that misaligned accesses are emulated via software and thus
> > + extremely slow, "slow" means that misaligned accesses are supported by
> > + hardware but still slower that aligned accesses sequences, and "fast"
> > + means that misaligned accesses are as fast or faster than the
> > + cooresponding aligned accesses sequences.
> > + $ref: "/schemas/types.yaml#/definitions/string"
> > + enum:
> > + - emulated
> > + - slow
> > + - fast
>
> I don't think this belongs in DT. (I'm not sure about a userspace
> interface either.)
>
> Can't this be tested and determined at runtime? Do misaligned accesses
> and compare the performance. We already do this for things like memcpy
> or crypto implementation selection.

There is also an long discussion about misaligned accesses
for loooongarch.

Basically if you want to run a common kernel (and userspace)
you have to default to compiling everything with -mno-stict-align
so that the compiler generates byte accesses for anything
marked 'packed' (etc).

Run-time tests can optimise some hot-spots.

In any case 'slow' is probably pointless - unless the accesses
take more than 1 or 2 extra cycles.

Oh, and you really never, ever want to emulate them.

Technically misaligned reads on (some) x86-64 cpu are slower
than aligned ones, but the difference is marginal.
I've measured two 64bit misaligned reads every clock.
But it is consistently slower by much less than one clock
per cache line.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)