Re: BogoMIPS

Richard B. Johnson (root@chaos.analogic.com)
Wed, 30 Sep 1998 15:54:55 -0400 (EDT)


On Wed, 30 Sep 1998, Jamie Lokier wrote:

> On Wed, Sep 30, 1998 at 09:00:09AM -0400, Richard B. Johnson wrote:
> > It has two jumps because one is not guaranteed to flush the cache.
> > I want any cache-refills to have been completed before the actual loop.
>
> I assume you don't want to flush the entire I-cache.
> Do you mean the instruction prefetch queue?
> I know that Intel say to use a far jump instruction to flush that.
> It is required when switch to real mode, for example.
>

Yes. The instruction cache, variously called the pre-fetch buffer.

> > What I really wanted to do was, " in Intel code"...
> >
> > PUSH CS ; This is a long-word (32-bit segment)
> > PUSH OFFSET DOIT ; This is a long-word (32-bit segment)
> > RETF ; Far 'return' to DOIT address.
> > DOIT: SUB EAX,1
> > JNC DOIT
> >
> > This would have made a 'far' JMP, using a far CALL, to the loop.
> > This guarantees a clean cache.
>
> Again, it sounds like you mean the prefetch queue.

I am always refering to the instruction cache, also called the prefetch
queue.

> I wonder if the
> above code uses RETF instead of JMPF on the grounds that the processor
> can't do branch prediction with RETF. Because since MMX, the processor
> has a "return stack buffer" for predicting the target of returns... I
> don't know if that would be relevant though.
>

It should not matter as long as the jump is a 32-bit jump. Using the
current cs and the offset within the 32-bit segment, pushing them
on the stack and executing a far return will guarantee that.

> > However, the GNU pseudo-assembler would not resolve the address of
> > the DOIT label (it is only be known at link-time), so I would have
> > to use some trick that I don't know about.
>
> The GNU assembler has no trouble. This should work:
>
> pushl %cs
> pushl $0f
> lretl
> .balign 32
> 0:
> subl $1,%eax
> jnz 0b
>

My attempts show that gnu pushes a near displacement, rather than a
far offset onto the stack. The chaotic result is the reason why I
didn't do this.

> decl %eax would give a bigger BogoMIPS rating on 386s, that might please
> really poor people :-) though you'd want an explicit check for %eax = 0.

The original code was unchanged in the loop.

> > Therefore, I had to hack around with '486s, '586s, and '686s to find
> > a jump combination that would work.
>
> I recommend against this.
>
> Intel documents the far jump's effect on the prefetch queue, when
> describing how to switch to real mode. You've already found out that a
> single near jump doesn't flush the queue properly -- I strongly suspect
> that has to do with branch prediction and/or the size of the two line
> I-cache decoding buffer used in some processors. In future, two jumps
> on 16 byte alignment may not be adequate.

The documentation, Page G3., CACHE and CODE ALIGNMENT, Intel486
Programmer's Reference Manual says that instruction fetches are 16 bytes
and it takes 16 bytes to fill a cache-line.

Intel586, Page 210 says mentions the same 16 bytes but does not claim
that alignment on greater than 0-mod-4 addresses will improve anything.

I don't have a 686 manual.

> No, the only way to do this is the _documented_ way (ick, if I only had
> the document handy :-), which is some form of far jump as you were
> trying to do.
>

Don't forget the reason for this. The reason was to eliminate getting
a different BogoMIPS reading when the kernel is rebuilt.

Cheers,
Dick Johnson
***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.1.123 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/