Re: [PATCH v2] x86/mm/pti: in pti_clone_pgtable(), increase addr properly

From: Thomas Gleixner
Date: Fri Aug 23 2019 - 20:59:46 EST


On Wed, 21 Aug 2019, Thomas Gleixner wrote:
> On Wed, 21 Aug 2019, Song Liu wrote:
> > > On Aug 20, 2019, at 1:23 PM, Song Liu <songliubraving@xxxxxx> wrote:
> > >
> > > Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
> > > This behavior changes after the 32-bit support: pti_clone_pgtable()
> > > increases addr by PUD_SIZE for pud_none(*pud) case, and increases addr by
> > > PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate because
> > > addr may not be PUD_SIZE/PMD_SIZE aligned.
> > >
> > > Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
> > > in these two cases.
> >
> > After poking around more, I found the following doesn't really make
> > sense.
>
> I'm glad you figured that out yourself. Was about to write up something to
> that effect.
>
> Still interesting questions remain:
>
> 1) How did you end up feeding an unaligned address into that which points
> to a 0 PUD?
>
> 2) Is this related to Facebook specific changes and unlikely to affect any
> regular kernel? I can't come up with a way to trigger that in mainline
>
> 3) As this is a user page table and the missing mapping is related to
> mappings required by PTI, how is the machine going in/out of user
> space in the first place? Or did I just trip over what you called
> nonsense?

And just because this ended in silence I looked at it myself after Peter
told me that this was on a kernel with PTI disabled. Aside of that my built
in distrust for debug war stories combined with fairy tale changelogs
triggered my curiousity anyway.

So that cannot be a kernel with PTI disabled compile time because in that
case the functions are not available, unless it's FB hackery which I do not
care about.

So the only way this can be reached is when PTI is configured but disabled
at boot time via pti=off or nopti.

For some silly reason and that goes back to before the 32bit support and
Joern did not notice either when he introduced pti_finalize() this results
in the following functions being called unconditionallY:

pti_clone_entry_text()
pti_clone_kernel_text()

pti_clone_kernel_text() was called unconditionally before the 32bit support
already and the only reason why it did not have any effect in that
situation is that it invokes pti_kernel_image_global_ok() and that returns
false when PTI is disabled on the kernel command line. Oh well. It
obviously never checked whether X86_FEATURE_PTI was disabled or enabled in
the first place.

Now 32bit moved that around into pti_finalize() and added the call to
pti_clone_entry_text() which just runs unconditionally.

Now there is still the interesting question why this matters. The to be
cloned page table entries are mapped and the start address even if
unaligned never points to something unmapped. The unmapped case only covers
holes and holes are obviously aligned at the upper levels even if the
address of the hole is unaligned.

So colour me still confused what's wrong there but the proper fix is the
trivial:

--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -666,6 +666,8 @@ void __init pti_init(void)
*/
void pti_finalize(void)
{
+ if (!boot_cpu_has(X86_FEATURE_PTI))
+ return;
/*
* We need to clone everything (again) that maps parts of the
* kernel image.

Hmm?

I'm going to look whether that makes any difference in the page tables
tomorrow with brain awake, but I wanted to share this before the .us
vanishes into the weekend :)

Thanks,

tglx