Re: [PATCH v2] ARC: io.h: Implement reads{x}()/writes{x}()

From: Vineet Gupta
Date: Mon Dec 03 2018 - 12:31:32 EST


On 12/3/18 2:10 AM, David Laight wrote:
> From: Vineet Gupta
> ...
>>> It also seems to have used a different type of loop to the
>>> other example, probably less efficient.
>>> (Not that I'm an expert on ARC opcodes.)
>> The difference is due to ISA and ensuing ARC gcc backends. ARCompact based cores
>> don't support unaligned access and the loop there was ZOL (Zero delay loop). In
>> ARCv2 based cores, the gcc backend has been tweaked to generate fewer ZOLs hence
>> you see the more canonical tst and branch style loop.
> Is this another case of the hardware implementing 'hardware' loop
> instructions that execute slower than ones made of simple instructions?

Not really. ZOL allow for hardware loops with no instruction/cycle overhead in
general. However as micro-arches get more complicated there are newer "gizmos"
added to the machinery which sometimes make it harder for the compliers to
optimize for all the cases. ARCv2 ISA has a new DBNZ instruction (similar to x86
you refer below) to implement loops and that is preferred over the ZOL.

> The worst example has to be the x86 'loop' (dec cx and jump nz)
> instruction which is microcoded on intel cpus.
> That makes it very difficult to use the new addx instruction to
> get two dependency chains through a loop.