Re: Fixing MIPS delay slot emulation weakness?

From: Maciej W. Rozycki
Date: Sun Dec 16 2018 - 20:55:39 EST


On Sun, 16 Dec 2018, Rich Felker wrote:

> So in theory it's possible that there's a cpu model with fancy new
> core instructions but no fpu. In this case, you would need the
> capability to emulate or execute-out-of-line these instructions. But I
> have no idea if such cpu models actually exist. If not, the concern
> can probably be ignored and it suffices to emulate just the parts of
> the base ISA that are valid in delay slots.

What do you call "a cpu model with fancy new core instructions"?

We've gone through 4 legacy MIPS base ISA revisions (I to IV) and then 4
modern ones that matter (R1 to R5; R4 was left out and R6 actually does
not have FPU branch delay slots), plus a bunch of ASEs (Application
Specific Extensions), such as DSP, MDMX, MIPS-3D, MSA, etc., each defining
further instructions. And then the microMIPS R3 and R5 ISAs (R6 uses a
different instruction encoding and does not have delay slots at all).
The MIPS16 ISA does not count however, even though it has delay slots and
we support it, because it does not have FPU instructions, let alone ones
that require delay slot emulation.

Some of the ASEs do not matter, e.g. we don't support MDMX in Linux as it
has user state we don't handle with context switches, and MIPS-3D and MSA
both imply an FPU, so software making use of them won't run with our FPU
emulation anyway as these ASEs' instructions are not emulated. Anything
else is potentially required.

As to actual implementations I believe all the Cavium Octeon line CPUs
(David, please correct me if I am wrong) have no FPU and they have vendor
extensions beyond the base ISA + ASE instruction set. Arguably you could
say that their additional instructions should not be scheduled into FPU
branch delay slots then, however the toolchain will happily do that, as I
wrote before.

I don't fully remember what the situation is WRT NetLogic/Broadcom XLR
and XLP chips. They do have vendor extensions, though IIRC they do have
an FPU too.

But then we have the "nofpu" kernel parameter anyway, which forces FPU
emulation for any hardware, so we need to emulate delay slots in that mode
with any hardware.

I'm afraid the problem is complex to solve overall, which is why we still
have issues, 18 years on from the inclusion of the FPU emulator:

commit 4c55adaa6d06e5533aebaceea7640ecf10952231
Author: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
Date: Sat Nov 25 04:49:46 2000 +0000

Kernel FPU emulator, chain saw edition.

(in the LMO GIT repo) and I think actually running the delay-slot
instruction (with a possible exception for things like ADDIUPC) rather
than interpreting it is the only feasible solution.

I'm not involved with MIPS architecture development anymore though and at
this point I only care about the few legacy platforms I have been taking
care of since forever, such as the DECstation port, for which our current
emulation solution suffices, so I am not going to commit myself to making
any inventions in this area. I hope my input is valuable though and will
help someone working on this.

Maciej