Re: [PATCH] x86 / hibernate: Fix 64-bit code passing control to image kernel

From: Rafael J. Wysocki
Date: Tue Jun 14 2016 - 18:41:45 EST


On Tuesday, June 14, 2016 08:06:49 PM chenyu wrote:
> On Mon, Jun 13, 2016 at 9:42 PM, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > Logan Gunthorpe reports that hibernation stopped working reliably for
> > him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
> > and rodata). Most likely, what happens is that the page containing
> > the image kernel's entry point is sometimes marked as non-executable
> > in the page tables used at the time of the final jump to the image
> > kernel. That at least is why commit ab76f7b4ab23 may matter.
> >
> > However, there is one more long-standing issue with the code in
> > question, which is that the temporary page tables set up by it
> > to avoid page tables corruption when the last bits of the image
> > kernel's memory contents are copied into their original page frames
> > re-use the boot kernel's text mapping, but that mapping may very
> > well get corrupted just like any other part of the page tables.
> > Of course, if that happens, the final jump to the image kernel's
> > entry point will go to nowhere.
> >
> 100 rounds test has passed with this patch on top of 4.7-rc3,
> Tested-by: Chen Yu <yu.c.chen@xxxxxxxxx>
>
> BTW, I'm thinking of another possible scenario this patch fixed the NX issue,
> according to the log previously provided by Logan in bugzilla 116941
>
> without ab76f7b4ab23:
>
> --[ High Kernel Mapping ]---
> 0xffffffff80000000-0xffffffff81000000 16M
> pmd
> 0xffffffff81000000-0xffffffff81600000 6M ro PSE
> GLB x pmd
> 0xffffffff81600000-0xffffffff81800000 2M ro PSE
> GLB NX pmd
> 0xffffffff81800000-0xffffffff81c00000 4M RW
> GLB NX pte
> 0xffffffff81c00000-0xffffffffa0000000 484M
> pmd
>
> with ab76f7b4ab23:
>
> ---[ High Kernel Mapping ]---
> 0xffffffff80000000-0xffffffff81000000 16M
> pmd
> 0xffffffff81000000-0xffffffff81400000 4M ro PSE
> GLB x pmd
> 0xffffffff81400000-0xffffffff8155e000 1400K ro
> GLB x pte
> 0xffffffff8155e000-0xffffffff81600000 648K RW
> GLB NX pte
> 0xffffffff81600000-0xffffffff81800000 2M ro PSE
> GLB NX pmd
> 0xffffffff81800000-0xffffffff81c00000 4M RW
> GLB NX pte
> 0xffffffff81c00000-0xffffffffa0000000 484M
> pmd
>
> ffffffff81446bb0 T restore_registers
>
>
> It looks like after the NX modification, the 'huge page' text mapping
> is splited into smaller pieces,
> from pmd to pte mapping, and since the original pmd is located in
> .data section(which should be
> the same across hibernation), while after modification the pte table
> is allocated dynamically,
> we can not guarantee the dynamically allocated pte table are the same
> across hibernation,
> thus the kernel entry of restore_registers might become unaccessible
> because of broken
> page table.

Right.

Quite frankly, I suspected something like that, but wasn't quite sure, so
thanks a lot for that analysis!

Rafael