Re: [PATCH 08/36] AArch64: Kernel booting and initialisation

From: Catalin Marinas
Date: Fri Jul 20 2012 - 09:16:58 EST

On Fri, Jul 20, 2012 at 01:32:39PM +0100, Geert Uytterhoeven wrote:
> On Fri, Jul 20, 2012 at 12:52 PM, Catalin Marinas
> <catalin.marinas@xxxxxxx> wrote:
> > On Fri, Jul 20, 2012 at 09:28:12AM +0100, Arnd Bergmann wrote:
> >> On Friday 20 July 2012, Jon Masters wrote:
> >> > > I think it would be best to list the technical limitations, from the
> >> > > kernel's perspective, of the unsupported exception levels and the
> >> > > advantages of the supported exception levels here. If you want to guide
> >> > > system builders towards EL2, I think it'd be more convincing to document
> >> > > the relevant technical aspects (perhaps KVM needs facilities only
> >> > > available in EL2) than just providing an unexplained requirement.
> >> >
> >> > Unless you enter at EL2 you can never install a hypervisor. That's the
> >> > reason for the requirement for generally entering at EL2 when possible.
> >>
> >> How do nested hypervisors work in this scenario? Does the first-level
> >> hypervisor (counting from most priviledged) provide a guest that starts
> >> in an emulated EL2 state, or is this done differently?
> >
> > Your favourite topic :). Self virtualisation is not easily possible, at
> > least not with how KVM on ARM is being implemented. The hardware does
> > not allow code running at EL1 to be told that it is at EL2 (or code
> > running at EL2 to be trapped at EL2). So for normal virtualisation,
> > guest OSes start at EL1 and they benefit of all the hardware
> > acceleration. If a guest OS wants to run KVM again, it won't have access
> > to the virtualisation extensions (EL2 system register access would cause
> > an undefined trap). The best it can do is run the nested guest OS in EL0
> > and trap accesses to system registers (no that different from Qemu).
> >
> > If such feature is needed, the best approach is for all kernels, host or
> > guest, to always enter at (non-secure) EL1. The EL2 would have a clearly
> > defined HVC API for nested page tables, virtual interrupts, context
> > switching etc. This way, the host OS can inform the hypervisor that
> > guest OSes are allowed to use this API for their own nested guests. But
> > getting such hypervisor API right is a bit tricky and the feedback from
> > the KVM guys so far is that they need the flexibility of running their
> > own code at EL2. I guess another benefit is that both KVM and Xen could
> > use the same API.
> >
> > But is this feature really needed?
> Sure :-)
> A sysadmin can prevent me from running my own virtualization layer and
> managing my own virtual machines (that's why UserModeLinux is so interesting).
> Can software detect if it's running in EL1 or EL2 (and e.g. refuse to run)?

Yes, if it's running at EL1 on higher. It can read the CurrentEL
register which is not virtualised. If it's running at EL0, CurrentEL
access can be trapped into the kernel at EL1 and return something else.
But when you run a guest in EL0 you don't benefit from the
virtualisation extensions, so UML may actually be faster.

For the second solution I mentioned above, the real host kernel still
starts in EL2 and installs the hypervisor code that is later used by the
the same host kernel running at EL1 (via HVC calls; that's already the
case with KVM on ARM). Other guests could use the same HVC calls to
create/switch nested page tables, deliver virtual interrupts, handle
faults etc. The hypervisor code needs to be aware of the multiple
nesting (and host-guest relation between the running OSes) as the
hardware only supports two stages of translation tables.

The advantage of the standardised API (and with open source code in the
kernel tree) is that other virtualisation solutions (e.g. vmware) could
use it as well.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at