Re: [PATCH 1/2] x86/efi: Map EFI memmap entries in-order at runtime

From: Ingo Molnar
Date: Sat Sep 26 2015 - 01:57:01 EST



So this commit worries me.

This bug is a good find, and the fix is obviously needed and urgent, but I'm not
sure about the implementation at all. (I've Cc:-ed a few more x86 low level
gents.)

* Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> wrote:

> /*
> + * Iterate the EFI memory map in reverse order because the regions
> + * will be mapped top-down. The end result is the same as if we had
> + * mapped things forward, but doesn't require us to change the
> + * existing implementation of efi_map_region().
> + */
> +static inline void *efi_map_next_entry_reverse(void *entry)
> +{
> + /* Initial call */
> + if (!entry)
> + return memmap.map_end - memmap.desc_size;
> +
> + entry -= memmap.desc_size;
> + if (entry < memmap.map)
> + return NULL;
> +
> + return entry;
> +}
> +
> +/*
> + * efi_map_next_entry - Return the next EFI memory map descriptor
> + * @entry: Previous EFI memory map descriptor
> + *
> + * This is a helper function to iterate over the EFI memory map, which
> + * we do in different orders depending on the current configuration.
> + *
> + * To begin traversing the memory map @entry must be %NULL.
> + *
> + * Returns %NULL when we reach the end of the memory map.
> + */
> +static void *efi_map_next_entry(void *entry)
> +{
> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
> + /*
> + * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
> + * config table feature requires us to map all entries
> + * in the same order as they appear in the EFI memory
> + * map. That is to say, entry N must have a lower
> + * virtual address than entry N+1. This is because the
> + * firmware toolchain leaves relative references in
> + * the code/data sections, which are split and become
> + * separate EFI memory regions. Mapping things
> + * out-of-order leads to the firmware accessing
> + * unmapped addresses.
> + *
> + * Since we need to map things this way whether or not
> + * the kernel actually makes use of
> + * EFI_PROPERTIES_TABLE, let's just switch to this
> + * scheme by default for 64-bit.

The thing is, if relative accesses between these 'sections' do happen then the
requirement is much stronger than just 'ordered by addresses' - then we must map
them continuously and as a single block!

So at minimum the comment should say that. But I think we want more:


> + */
> + return efi_map_next_entry_reverse(entry);
> + }
> +
> + /* Initial call */
> + if (!entry)
> + return memmap.map;
> +
> + entry += memmap.desc_size;
> + if (entry >= memmap.map_end)
> + return NULL;
> +
> + return entry;
> +}
> +
> +/*
> * Map the efi memory ranges of the runtime services and update new_mmap with
> * virtual addresses.
> */
> @@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
> unsigned long left = 0;
> efi_memory_desc_t *md;
>
> - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
> + p = NULL;
> + while ((p = efi_map_next_entry(p))) {
> md = p;
> if (!(md->attribute & EFI_MEMORY_RUNTIME)) {

So why is this 64-bit only? Is 32-bit not affected because there we allocate
virtual addresses bottom-up?

This would be a lot clearer if we just mapped the entries in order, no questions
asked. Conditions like this:

> + if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {

... just invite confusion and possible corner cases where we end up mapping them
wrong.

So could we make the whole code obviously bottom-up? Such as first calculating the
size of virtual memory needed, then allocating a _single_, obviously continuous
mapping, and then doing a very clear in-order mapping within that window? That
would remove any bitness and legacy dependencies.

Good virtual memory layout is so critical for any third party code that this
should be the central property of the whole approach, not just some side condition
somewhere in an iteration loop.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/