Re: [tip:efi/core] x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings

From: Matt Fleming
Date: Thu Feb 25 2016 - 10:27:42 EST


On Wed, 24 Feb, at 11:49:23AM, Andy Lutomirski wrote:
> On Wed, Feb 24, 2016 at 11:33 AM, Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Wed, 24 Feb, at 08:36:33AM, Andy Lutomirski wrote:
> >> On Wed, Feb 24, 2016 at 8:20 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> >> > On Wed, Feb 24, 2016 at 02:10:46PM +0000, Matt Fleming wrote:
> >> >> > Normally, the only pages with are _PAGE_GLOBAL are those that are in
> >> >> > the normal kernel mappings (swapper_pg_dir and normal mm_struct pgds).
> >> >> > By allowing _PAGE_GLOBAL to be set in EFI mappings, you're breaking
> >> >> > that convention, which forces you to use extra-expensive
> >> >> > __flush_tlb_all calls in efi_call_virt.
> >> >
> >> > Hold on, do you mean the __flush_tlb_all() in the CONFIG_EFI_MIXED code?
> >> >
> >> > That's mixed mode. I think you mean the FLUSH_TLB_ALL in efi_call.
> >> > That's EFI on 64-bit but that is mandated by the spec, AFAIR.
> >>
> >> I mean the one in efi_call_virt. Why would the spec mandate a TLB
> >> flush at all? EFI runtime services have no business touching the
> >> paging structures directly. Heck, the 32-bit ones don't even know the
> >> *format* of the paging structures.
> >
> > Right, and it would necessitate copying out arguments because the
> > firmware won't understand where/how the kernel has mapped things.
> >
> > No firmware is going to be doing that.
>
> Just so I understand correctly: could we get away with putting the EFI
> virtual runtime mappings at positive (user) addresses for 64-bit UEFI,
> or is there some reason that we need the high bit set?

Good question. There are multiple parts to this answer:

1) Some firmware is known to break when entered via the identity
addresses (VA==PA)

2) Kexec cares the most about where we map things because the region
has to be static across Kexec reboots. We do pass the kernel's EFI
memory map regions between Kexec kernels but that region clearly
needs to be available across kernel versions

It shouldn't be possible to conflict with userspace mappings or
anything like that because we should never be accessing userspace
addresses during EFI runtime services calls - all relevant data is
copied to a kernel buffer or such. Userspace isn't even mapped now
we've got completely separate EFI page tables.

I don't think there's anything else that would stop us clearing the
high bit and moving the EFI virtual mapping region somewhere else.
Boris?

> If we could use positive addresses, then we could use the existing
> use_mm infrastructure directly with no funny business at all except to
> the extent that we might need to use unusual APIs to set up the VMAs
> (if we use real VMAs) in the first place. (We could cheat and
> allocate a single monstrous VM_MIXEDMAP or VM_PFNMAP vma with a .fault
> handler that always fails.) If we have to use negative addresses,
> then we'll always be stuck with a funny pgd, but we could still
> probably use use_mm instead of manually fiddling with cr3.

We don't use VMAs at the moment.

Having a custom .fault handler could be a very interesting idea
because we've talked about wanting to do EFI-specific things in the
past if we fault while executing firmware, e.g. printing warnings in
the kernel log indicating the firmware is known to be buggy because it
performed an access not compliant with the spec. See 1) above.

> Some day I want to experiment with calling runtime services at CPL 3,
> too :) We'd want to add some infrastructure to permit kernel threads
> to run through the entry/exit code as if they were user processes, but
> there's nothing conceptually wrong with that. We already allow kernel
> threads to call execve and "return" to real user mode, so it's not
> much of a stretch. The main issue would be dealing with signal
> handling and such -- we'd want to report faults back to the kernel
> thread's CPL3-invocation thunks rather than delivering a signal at CPL
> 3.

Right, more isolation is better. I'm not sure we could get all the way
to CPL 3 but I wouldn't begrudge anyone trying.