Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

From: H. Peter Anvin
Date: Tue Sep 29 2015 - 14:20:05 EST


SGDT would be easy to use, and it is logical that it is faster since it reads an internal register. SIDT does too but unlike the GDT has a secondary limit (it can never be larger than 4096 bytes) and so all limits in the range 4095-65535 are exactly equivalent.

Anything that causes a write to the GDT will #PF if read-only. So yes, we need to force the accessed bit to set. This shouldn't be a problem and in fact ought to be a performance improvement.

On September 29, 2015 10:35:38 AM PDT, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>On Sep 29, 2015 2:01 AM, "Ingo Molnar" <mingo@xxxxxxxxxx> wrote:
>>
>>
>> * Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>>
>> > On 09/28/2015 09:58 AM, Ingo Molnar wrote:
>> > >
>> > > * Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>> > >
>> > >> On 09/26/2015 09:50 PM, H. Peter Anvin wrote:
>> > >>> NAK. We really should map the GDT read-only on all 64 bit
>systems,
>> > >>> since we can't hide the address from SLDT. Same with the IDT.
>> > >>
>> > >> Sorry, I don't understand your point.
>> > >
>> > > So the problem is that right now the SGDT instruction (which is
>unprivileged)
>> > > leaks the real address of the kernel image:
>> > >
>> > > fomalhaut:~> ./sgdt
>> > > SGDT: ffff88303fd89000 / 007f
>> > >
>> > > that 'ffff88303fd89000' is a kernel address.
>> >
>> > Thank you.
>> > I do know that SGDT and friends are unprivileged on x86
>> > and thus they allow userspace (and guest kernels in paravirt)
>> > learn things they don't need to know.
>> >
>> > I don't see how making GDT page-aligned and page-sized
>> > changes anything in this regard. SGDT will still work,
>> > and still leak GDT address.
>>
>> Well, as I try to explain it in the other part of my mail, doing so
>enables us to
>> remap the GDT to a less security sensitive virtual address that does
>not leak the
>> kernel's randomized address:
>>
>> > > Your observation in the changelog and your patch:
>> > >
>> > >>>> It is page-sized because of paravirt. [...]
>> > >
>> > > ... conflicts with the intention to mark (remap) the primary GDT
>address read-only
>> > > on native kernels as well.
>> > >
>> > > So what we should do instead is to use the page alignment
>properly and remap the
>> > > GDT to a read-only location, and load that one.
>> >
>> > If we'd have a small GDT (i.e. what my patch does), we still can
>remap the
>> > entire page which contains small GDT, and simply don't care that
>some other data
>> > is also visible through that RO page.
>>
>> That's generally considered fragile: suppose an attacker has a
>limited information
>> leak that can read absolute addresses with system privilege but he
>doesn't know
>> the kernel's randomized base offset. With a 'partial page' mapping
>there could be
>> function pointers near the GDT, part of the page the GDT happens to
>be on, that
>> leak this information.
>>
>> (Same goes for crypto keys or other critical information (like canary
>information,
>> salts, etc.) accidentally ending up nearby.)
>>
>> Arguably it's a bit tenuous, but when playing remapping games it's
>generally
>> considered good to be page aligned and page sized, with zero padding.
>>
>> > > This would have a couple of advantages:
>> > >
>> > > - This would give kernel address randomization more teeth on
>x86.
>> > >
>> > > - An additional advantage would be that rootkits overwriting the
>GDT would have
>> > > a bit more work to do.
>> > >
>> > > - A third advantage would be that for NUMA systems we could
>'mirror' the GDT into
>> > > node-local memory and load those. This makes GDT load
>cache-misses a bit less
>> > > expensive.
>> >
>> > GDT is per-cpu. Isn't per-cpu memory already NUMA-local?
>>
>> Indeed it is:
>>
>> fomalhaut:~> for ((cpu=1; cpu<9; cpu++)); do taskset $cpu ./sgdt ;
>done
>> SGDT: ffff88103fa09000 / 007f
>> SGDT: ffff88103fa29000 / 007f
>> SGDT: ffff88103fa29000 / 007f
>> SGDT: ffff88103fa49000 / 007f
>> SGDT: ffff88103fa49000 / 007f
>> SGDT: ffff88103fa49000 / 007f
>> SGDT: ffff88103fa29000 / 007f
>> SGDT: ffff88103fa69000 / 007f
>>
>> I confused it with the IDT, which is still global.
>>
>> This also means that the GDT in itself does not leak kernel addresses
>at the
>> moment, except it leaks the layout of the percpu area.
>>
>> So my suggestion would be to:
>>
>> - make the GDT unconditionally page aligned and sized, then remap it
>to a
>> read-only address unconditionally as well, like we do it for the
>IDT.
>
>Does anyone know what happens if you stick a non-accessed segment in
>the GDT, map the GDT RO, and access it? The docs are extremely vague
>on the interplay between segmentation and paging on the segmentation
>structures themselves. My guess is that it causes #PF. This might
>break set_thread_area users unless we change set_thread_area to force
>the accessed bit on.
>
>There's a possible worse failure mode: if someone pokes an un-accessed
>segment into SS or CS using sigreturn, then it's within the realm of
>possibility that IRET would generate #PF (hey Intel and AMD, please
>document this!). I don't think that would be rootable, but at the
>very least we'd want to make sure it doesn't OOPS by either making it
>impossible or adding an explicit test to sigreturn.c.
>
>hpa pointed out in another thread that the GDT *must* be writable on
>32-bit kernels because we use a task gate for NMI and jumping through
>a task gate writes to the GDT.
>
>On another note, SGDT is considerably faster than LSL, at least on
>Sandy Bridge. The vdso might be able to take advantage of that for
>getcpu.
>
>--Andy

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/