Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

From: Jason A. Donenfeld
Date: Fri Dec 30 2022 - 12:07:48 EST


On Fri, Dec 30, 2022 at 6:01 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> On Fri, Dec 30, 2022 at 04:54:27PM +0100, Jason A. Donenfeld wrote:
> > > Right, with CONFIG_X86_VERBOSE_BOOTUP=y in a guest here, it says:
> > >
> > > early console in extract_kernel
> > > input_data: 0x000000000be073a8
> > > input_len: 0x00000000008cfc43
> > > output: 0x0000000001000000
> > > output_len: 0x000000000b600a98
> > > kernel_total_size: 0x000000000ac26000
> > > needed_size: 0x000000000b800000
> > > trampoline_32bit: 0x000000000009d000
> > >
> > > so that's a ~9M kernel which gets decompressed at 0x1000000 and the
> > > output len is, what, ~180M which looks like plenty to me...
> >
> > I think you might have misunderstood the thread.
>
> Maybe...
>
> I've been trying to make sense of all the decompression dancing we're doing and
> the diagrams you've drawn and there's a comment over extract_kernel() which
> basically says that the compressed image is at the back of the memory buffer
>
> input_data: 0x000000000be073a8
>
> and we decompress to a smaller address
>
> output: 0x0000000001000000
>
> And, it *looks* like setup_data is at an address somewhere >= 0x1000000 so when
> we start decompressing, we overwrite it.
>
> I guess extract_kernel() could also dump the setup_data addresses so that we can
> verify that...
>
> > First, to reproduce the bug that this patch fixes, you need a kernel with a
> > compressed size of around 16 megs, not 9.
>
> Not only that - you need setup_data to be placed somewhere just so that it gets
> overwritten during decompression. Also, see below.
>
> > Secondly, that crash is well understood and doesn't need to be reproduced;
>
> Is it?
>
> Where do you get the 0x100000 as the starting address of the kernel image?
>
> Because input_data above says that the input address is somewhere higher...

Look closer at the boot process. The compressed image is initially at
0x100000, but it gets relocated to a safer area at the end of
startup_64:

/*
* Copy the compressed kernel to the end of our buffer
* where decompression in place becomes safe.
*/
pushq %rsi
leaq (_bss-8)(%rip), %rsi
leaq rva(_bss-8)(%rbx), %rdi
movl $(_bss - startup_32), %ecx
shrl $3, %ecx
std
rep movsq
cld
popq %rsi

/*
* The GDT may get overwritten either during the copy we just did or
* during extract_kernel below. To avoid any issues, repoint the GDTR
* to the new copy of the GDT.
*/
leaq rva(gdt64)(%rbx), %rax
leaq rva(gdt)(%rbx), %rdx
movq %rdx, 2(%rax)
lgdt (%rax)

/*
* Jump to the relocated address.
*/
leaq rva(.Lrelocated)(%rbx), %rax
jmp *%rax

And that address winds up being calculated with a combination of the
image size and the init_size param, so it's safe. Decompression won't
overwrite the compressed kernel.

HOWEVER, qemu currently appends setup_data to the end of the
compressed kernel image, and this part isn't moved, and setup_data
links aren't walked/relocated. So that means the original address
remains, of 0x100000.

Jason