Re: [PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data

From: Philippe Mathieu-Daudé
Date: Mon Jan 23 2023 - 03:26:27 EST


On 30/12/22 23:07, Jason A. Donenfeld wrote:
The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

kernel image setup_data
|--------------------------||----------------|
0x100000 0x100000+l1 0x100000+l1+l2

The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:

kernel image setup_data
|--------------------------||----------------|
0x100000 0x100000+l1 0x100000+l1+l2

d e c o m p r e s s e d k e r n e l
|-------------------------------------------------------------|
0x1000000 0x1000000+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up. So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Cc: x86@xxxxxxxxxx
Cc: Philippe Mathieu-Daudé <philmd@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Eric Biggers <ebiggers@xxxxxxxxxx>
Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
---
Changes v2->v3:
- Fix mistakes in string handling.
Changes v1->v2:
- Append setup_data to cmdline instead of kernel image.

hw/i386/microvm.c | 13 ++++++----
hw/i386/x86.c | 50 +++++++++++++++++++--------------------
hw/nvram/fw_cfg.c | 9 +++++++
include/hw/i386/microvm.h | 5 ++--
include/hw/nvram/fw_cfg.h | 9 +++++++
5 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index a00881bc64..432754eda4 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len)
fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
}
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
+{
+ int arch = !!(key & FW_CFG_ARCH_LOCAL);
+
+ key &= FW_CFG_ENTRY_MASK;
+ assert(key < fw_cfg_max_entry(s));
+ return s->entries[arch][key].data;

Shouldn't it be safer to provide a size argument, and return
NULL if s->entries[arch][key].len < size?

Maybe this API should return a (casted) const pointer, so the
only way to update the key is via fw_cfg_add_bytes().

+}
+

diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 2e503904dc..990dcdbb2e 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t key,
void *data, size_t len,
bool read_only);
+/**
+ * fw_cfg_read_bytes_ptr:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ *
+ * Reads an existing fw_cfg data pointer.
+ */
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
+
/**
* fw_cfg_add_string:
* @s: fw_cfg device being modified