Re: BogoMIPS

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Wed, 30 Sep 1998 15:32:22 +0100


On Wed, Sep 30, 1998 at 09:00:09AM -0400, Richard B. Johnson wrote:
> It has two jumps because one is not guaranteed to flush the cache.
> I want any cache-refills to have been completed before the actual loop.

I assume you don't want to flush the entire I-cache.
Do you mean the instruction prefetch queue?
I know that Intel say to use a far jump instruction to flush that.
It is required when switch to real mode, for example.

> What I really wanted to do was, " in Intel code"...
>
> PUSH CS ; This is a long-word (32-bit segment)
> PUSH OFFSET DOIT ; This is a long-word (32-bit segment)
> RETF ; Far 'return' to DOIT address.
> DOIT: SUB EAX,1
> JNC DOIT
>
> This would have made a 'far' JMP, using a far CALL, to the loop.
> This guarantees a clean cache.

Again, it sounds like you mean the prefetch queue. I wonder if the
above code uses RETF instead of JMPF on the grounds that the processor
can't do branch prediction with RETF. Because since MMX, the processor
has a "return stack buffer" for predicting the target of returns... I
don't know if that would be relevant though.

> However, the GNU pseudo-assembler would not resolve the address of
> the DOIT label (it is only be known at link-time), so I would have
> to use some trick that I don't know about.

The GNU assembler has no trouble. This should work:

pushl %cs
pushl $0f
lretl
.balign 32
0:
subl $1,%eax
jnz 0b

decl %eax would give a bigger BogoMIPS rating on 386s, that might please
really poor people :-) though you'd want an explicit check for %eax = 0.

> Therefore, I had to hack around with '486s, '586s, and '686s to find
> a jump combination that would work.

I recommend against this.

Intel documents the far jump's effect on the prefetch queue, when
describing how to switch to real mode. You've already found out that a
single near jump doesn't flush the queue properly -- I strongly suspect
that has to do with branch prediction and/or the size of the two line
I-cache decoding buffer used in some processors. In future, two jumps
on 16 byte alignment may not be adequate.

No, the only way to do this is the _documented_ way (ick, if I only had
the document handy :-), which is some form of far jump as you were
trying to do.

I suspect even that may stop working in time, as the prefetch queue
_timing_ is not documented. The TSC would seem like the right thing to
use when it's available, then we can fall back on the timing loop for
the cases where it isn't, which are all known and tested.

> You can easly muck with this in user-mode and test all possible
> combinations. Take out the second jump and you will have bobble on
> Pentium II (Klamath) machines. Take out all jumps and just align the
> function and you will have problems with everything except
> '486s. Don't align the function and you will have problems even with
> '486s.

I don't have any of those processors :-)

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/