Re: Simple x86 Simulator (was: Re: Flame Linus to a crisp!)

From: Antonio Vargas (wind@cocodriloo.com)
Date: Fri Apr 25 2003 - 06:44:34 EST


On Fri, Apr 25, 2003 at 05:10:44PM +0100, John Bradford wrote:
> > > We could not. Consider just the 8 32-bit-wide legacy x86 registers,
> > > excluding the MMX and FPU registers:
> > > (AX, BX, CX, DX, BP, SI, DI, SP). 32 bits x 8 = 2^256 independent
> > > states to look up in the table, each state having 256 bits of
> > > information. 2^264 total bits of information needed. Assume 1 GB
> > > dimms (2^30 * 8 bits each = 2^33 bits of info), with a volume of 10
> > > cm^3 per DIMM (including a tiny amount of space for air circulation.).
> > > Need 34508731733952818937173779311385127262255544860851932776 cubic
> > > kilometers of space.
> > >
> > > Considerably larger than the volume of the earth, although admittedly
> > > smaller than the total volume of the universe.
> > > --Steven Augart
> >
> > If this could be done, someone would have done it already.
>
> It's certainly possible to implement most of the functionality of a
> very simple processor this way, but applying the idea to an X86
> compatible processor was a joke.
>
> What interests me now is whether we could cache the results of certain
> opcode strings in a separate memory area.
>
> Say for example, you have a complicated routine that runs in to
> hundreds of opcodes, which is being applied to large amounts of data,
> word by word. If one calculation doesn't depend on another, you could
> cache the results, and then merely fetch them from the results cache
> when the input data repeats itself.
>
> I.E. the processor dynamically makes it's own look-up tables.

This is called dynamic programming and is done by keeping a cache
for previous results. Take for example a simple fibonacci function:

int fib(n){
  return fib(n-2) + fib(n-1);
}

Now hook up a direct-mapped cache:

int cache[65536];

int fib(n){
  int x = cache[n];
  if(x) return x;

  x = fib(n-2) + fib(n-1);

  cache[n] = x;
  return x;
}

Of course, this limits the range too much, so coding a LRU or
some other cache system would consume less memory and give
better speed.

Your idea, applying this to general hardware execution could
be _really_ nice in fact :)

Greets, Antonio.

ps. I recall that todays hardware does it for some stuff:
    the TBL cache replaces an expensive and sometimes
    complex page-table-tree walk (m68030 mmu-docs come
    to mind)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Apr 30 2003 - 22:00:22 EST