Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd

From: H. Peter Anvin
Date: Mon Nov 12 2018 - 11:48:19 EST


On 11/11/18 10:19 PM, Ingo Molnar wrote:
>
>> In part as a result of this exchange I have spent some time thinking
>> about the boot protocol and its dependencies, and there is, in fact, a
>> much more serious problem that needs to be addressed: it is not
>> currently possible in a forward-compatible way to map all data areas
>> that may be occupied by bootloader-provided data. The kernel proper has
>> an advantage here, in that the kernel will by definition always be the
>> "owner of the protocol" (anything the kernel doesn't know how to map
>> won't be used by the kernel anyway), but it really isn't a good
>> situation. So I'm currently trying to think up a way to make that
>> possible.
>
> I might be a bit dense early in the morning, but could you elaborate?
> What do you mean by mapping all data areas?

Alright, awake now...

As it sits right now, the protocol contains a number of data structures with
pointers, pointing to a variety of memory areas that can be set up by the
bootloader. Now, consider something like KASLR or a secondary boot loader
where we need to allocate memory in between the primary bootloader and the
kernel to be run. With the kernel proper, in the absence of KASLR, we have
solved this by marking out exactly how much memory the kernel may need before
it has its own memory manager up and running, but KASLR needs to move it
outside this range, and a secondary boot loader shim of some sort may need to
allocate additional data structures. In the particular case of an UEFI system
where we do the right thing (which Grub2 doesn't, by default) and enter via
the kernel UEFI stub we are okay, but for other boot scenarios we are in
trouble: even if we know where all the pointers are and how to determine the
size of various data structures, once the protocol is updated with new
information then that is no longer valid.

The setup_data linked list solves that under certain circumstances, but in
others it has turned out to not be adequate.

There are a couple of options:

a) Not allow any new pointers to memory areas in what is considered system
RAM. Such data structures *must* have a setup_data linked list header.
Pointers into E820 table reserved areas are still acceptable.

b) Create a new E820 table memory type for "boot data", similar to what UEFI
already has, and encourage boot loaders to mark any allocated memory
structures that way. The main problem with that is that the poor quality
of boot loaders may mean that that fails to happen, and because it wouldn't
"fail hard" it is likely that they will get it wrong.

The difference from the RESERVED memory type is that the kernel can reclaim
that memory after the data has been recovered.

c) This might be the preferred option:

1. Just like (a), do not allow new pointers to memory areas in system RAM
in struct boot_params.
2. Create a subrange of struct setup_data (e.g. bit 30 = 1) explicitly
containing pointers to other data structures, including sizes, in a
way that can be parsed by generic code.
3. Encourage boot loaders to make sure the setup_data list is in order of
ascending address (and WARN if it is not.)
4. Add (b) as an option, for responsible boot loaders ;) to provide an
extra level of protection.