Re: Xen & VMI?

From: Ingo Molnar
Date: Wed Mar 07 2007 - 03:16:51 EST



* Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote:

> On Tue, 2007-03-06 at 21:37 +0100, Ingo Molnar wrote:
> > maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if
> > it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can
> > map their own internals to that /one/ ABI.
>
> I think it's an excellent aim, but it's *HARD*. I rejected this
> approach earlier because I'm just not smart enough. (Yet?)
>
> The Linux side is fairly stable. The hardware side is changing, and
> the hypervisor side is changing. This means the ABI will churn fairly
> fast. The hypervisors are very different, which means the ABI will be
> very wide.

the 'hardware is changing fast so we cannot do a sane API' argument
sounds good at first but in this context it is still fundamentally
wrong. Hardware has changed /dramatically/ since we started Linux, still
we didnt have to do dramatic changes to the system call API/ABI. Why?
Because hardware too is fundamentally controlled by the rules of this
world. So if you know the laws of physics, math and computer science,
you /CAN/ do a sane API that lives for quite some time. We had these
kinds of discussions when Linux was just a few years old - many people
were worried about 'the hardware changes too fast' - but if the
fundamentals are strong, it _doesnt really matter_, as long as our
interfaces are sane and we quickly adopt our internals.

On the other hand, Linux's internal details, semantics, approaches are a
lot more ad-hoc and alot more affected by changes in the hardware
environment - that's why i'd not like to see some external ABI
constraint limit aspects of those internals.

For example, VMI_CALL_SetAlarm takes a 'cycles' argument. Cycles is a
quite bad unit for an API, it should be absolute time, nanosec or
picosec based instead. We could easily see CPUs that have /no concept of
cycles/, at all! Even today's CPUs have hardly any fix concept of
cycles, due to cpufreq. It's as if 15 years ago we had based sys_mmap()
around the concept of '16-bit segments'. We could certainly make it work
on current hardware but it would look pretty awkward today.

in fact hardware changes alot more by just going from one Linux arch to
another - still the system call API is essentially the same. (with
small, non-fundamental variations)

furthermore, most of the details in VMI or in Xen's lowlevel APIs (where
most of the overlap is currently - VMI doesnt have all that many
highlevel APIs) are cast into stone. The i386 arch is not going to
change, ever. Most details of the x86_64 arch is not going to change,
ever. It's unclear whether there will ever be the need for any x86_128
arch (for humans). So it should be quite possible to come up with
something sane for these lowlevel details, and cast it into stone, for
everyone. Just like the chip makers cast it into silicon.

the more nontrivial (and thus more harmful, because more
design-limiting) bits are the highlevel APIs.

> We could start with VMI and try to support Xen, KVM and lguest. It
> would at least give us a better idea of the scope of the problem. But
> IMHO it's a *huge* job.

yeah, it's a nontrivial job - like writing a sane OS. But it's doable
and we are in fact out here trying to do exactly that, right? ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/