Re: [BUG] x86/efi: MMRs no longer properly mapped after switch to isolated page table

From: Borislav Petkov
Date: Mon May 02 2016 - 06:02:42 EST


On Fri, Apr 29, 2016 at 10:41:19AM -0500, Alex Thorlton wrote:
> I think this is partially correct, but in doing that, we find that we're
> still missing something. Watch what happens when I make this small
> tweak to my kernel:
>
> 8<---
> diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c
> b/arch/x86/kernel/apic/x2apic_uv_x.c
> index 624db005..91ac029 100644
> --- a/arch/x86/kernel/apic/x2apic_uv_x.c
> +++ b/arch/x86/kernel/apic/x2apic_uv_x.c
> @@ -891,7 +891,7 @@ void __init uv_system_init(void)
> pr_info("UV: Found %s hub\n", hub);
>
> /* We now only need to map the MMRs on UV1 */
> - if (is_uv1_hub())
> + //if (is_uv1_hub())
> map_low_mmrs();
>
> m_n_config.v = uv_read_local_mmr(UVH_RH_GAM_CONFIG_MMR );
> --->8
>
> Here's the result:
>
> 8<---
> [ 5.353656] BUG: unable to handle kernel paging request at ffff88006a1ab938
> [ 5.361448] IP: [<ffff88006a1ab938>] 0xffff88006a1ab938
> [ 5.367290] PGD 1f81067 PUD 87ffff067 PMD 87fff8067 PTE 0
> [ 5.373356] Oops: 0010 [#1] SMP
> [ 5.376977] Modules linked in:
> [ 5.380395] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.5.0-rc2-uv4-comm-debug-fix+ #538
> [ 5.389428] Hardware name: SGI UV3000/UV3000, BIOS SGI UV 3000 series BIOS 01/15/2015
> [ 5.398169] task: ffff880867ec4040 ti: ffff880867ec8000 task.ti: ffff880867ec8000
> [ 5.406522] RIP: 0010:[<ffff88006a1ab938>] [<ffff88006a1ab938>] 0xffff88006a1ab938
> [ 5.415080] RSP: 0000:ffff880867ecbc88 EFLAGS: 00010086
> [ 5.421006] RAX: 0000000000000000 RBX: 0000000000000282 RCX: 0000000000000001
> [ 5.428971] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88006a1ab938
> [ 5.436935] RBP: ffff880867ecbd58 R08: ffff880867ecbd68 R09: ffff880867ecbd70
> [ 5.444900] R10: ffffffffffffffda R11: 000000006a1ab938 R12: 0000000000000000
> [ 5.452864] R13: ffffffff81dcf0b8 R14: ffffffff81dcf0c0 R15: ffffffff81dcf0a0
> [ 5.460829] FS: 0000000000000000(0000) GS:ffff880878c00000(0000) knlGS:0000000000000000
> [ 5.469861] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5.476274] CR2: ffff88006a1ab938 CR3: 0000000001a0a000 CR4: 00000000001406f0
> [ 5.484240] Stack:
> [ 5.486483] ffffffff8105d7f8 0000000000000000 0000000000000006 0000000000000006
> [ 5.494777] 000000000000001e 0000000000000000 0000000000000000 ffff880867ecbd38
> [ 5.503074] 0000000080050033 0000000000000000 0000000000000000 0000000000000000
> [ 5.511368] Call Trace:
> [ 5.514098] [<ffffffff8105d7f8>] ? efi_call+0x58/0x90
> [ 5.519834] [<ffffffff8106033d>] ? uv_bios_call_irqsave+0x5d/0x80
> [ 5.526733] [<ffffffff810603a0>] uv_bios_get_sn_info+0x40/0xb0
> [ 5.533344] [<ffffffff81b6f824>] uv_system_init+0x772/0x104d
> [ 5.539751] [<ffffffff810bd479>] ? vprintk_default+0x29/0x40
> [ 5.546159] [<ffffffff81161cf8>] ? printk+0x4d/0x4f
> [ 5.551692] [<ffffffff81b6ac75>] native_smp_prepare_cpus+0x299/0x2e4
> [ 5.558884] [<ffffffff81b5c18e>] kernel_init_freeable+0xc3/0x21b
> [ 5.565680] [<ffffffff815acd00>] ? rest_init+0x80/0x80
> [ 5.571502] [<ffffffff815acd0e>] kernel_init+0xe/0xf0
> [ 5.577238] [<ffffffff815b87cf>] ret_from_fork+0x3f/0x70
> [ 5.583264] [<ffffffff815acd00>] ? rest_init+0x80/0x80
> [ 5.589093] Code: Bad RIP value.
> [ 5.592812] RIP [<ffff88006a1ab938>] 0xffff88006a1ab938
> [ 5.598748] RSP <ffff880867ecbc88>
> [ 5.602638] CR2: ffff88006a1ab938
> [ 5.606339] ---[ end trace 3abaacb020c74a50 ]---
> [ 5.611487] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> --->8
>
> You can see here that we've made it past the MMR read in uv_system_init,
> but we die inside of our first EFI callback. In this example, it looks
> like we're using the kernel page table at the time of the failure, and I
> believe that the failing address is somewhere in our EFI runtime code:

I think I see what's going on:

[ 5.367290] PGD 1f81067 PUD 87ffff067 PMD 87fff8067 PTE 0

PTE 0 because, most probably, you need to sync
efi_sync_low_kernel_mappings(). Why?

So the point of time this call is done, is, IINM, after we have
enabled virtual mode. I.e., it is being done in start_kernel() and
your callstack points at rest_init() which happens later in that same
function.

So IMO what you should be doing, instead, is doing efi_call_virt() in
uv_bios_call() which should take care of everything.

I think this naked efi_call() in uv_bios_call() should not be there
but UV should be calling the _phys or _virt helpers from the EFI core.

Preferrably someone should go and audit all those EFI calls in UV and
figure out which one to use, _phys or _virt depending on the point in
time this call is being done.

For example, uv_system_init() should all be _virt calls, obviously.
And from a quick scan, most of the EFI calls are coming from there so
everything should be _virt.

Btw, uv_bios_call_reentrant() looks unused, might want to remove it
while at it.

Hmmm.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix ImendÃrffer, Jane Smithard, Graham Norton, HRB 21284 (AG NÃrnberg)
--