Re: [PATCH 2/2 v5] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

From: lijiang
Date: Wed Nov 07 2018 - 04:11:07 EST


å 2018å11æ07æ 13:23, Baoquan He åé:
> On 11/07/18 at 01:00pm, Lianbo Jiang wrote:
>> E820 reserved ranges is useful in kdump kernel, it has been added in
>> kexec-tools code.
>>
>> One reason is PCI mmconf (extended mode) requires reserved region otherwise
>> it falls back to legacy mode, and also outputs the following kernel log.
>
> OK, it falls back to legacy mode, and also output kernel log, except of
> these, does it crash kernel? kdump kernel hang? Can we leave it if it
> only ouptut kernel log?
>
>>
>> Example:
>> ......
>> [ 19.798354] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>> [ 19.800653] [Firmware Info]: PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] not reserved in ACPI motherboard resources
>> [ 19.800995] PCI: not using MMCONFIG
>> ......
>>
>> The correct kernel log is like this:
>> ......
>> [ 0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
>> [ 0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
>> ......
>>
>> Furthermore, when AMD SME kdump support, it needs to map dmi table area
>> as decrypted. For normal boot, these ranges sit in e820 reserved ranges,
>> thus the early ioremap code naturally map them as decrypted. If it also
>> has same e820 reserve setup in kdump kernel then it will just work like
>> normal kernel.
>
> Why do we care? If don't fix, what's happening?
>
> Lianbo, for a bug fix, please describe the problems. Then give out the
> analysis about root cause.
>

Thanks for your comment in detail.

In fact, these patches are really simple. As the subject mentioned, this patch
[PATCH 2/2] adds the reserved e820 ranges to kdump kernel e820 table, and the
first patch [PATCH 1/2] helps to exactly add the e820(E820_TYPE_RESERVED) type
to kdump kernel e820 table, that is to say, it will filter out some unnecessary
type(E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_RESERVED_KERN).

At present, when we use the kexec to load the kernel image and initramfs(for
example: kexec -s -p xxxx), the latest kernel does not pass the e820 reserved
ranges to the second kernel, which might produce two problems:

The first one is the MMCONFIG issue, although which does not make the system
crash or hang, this issue is still a potential risk, because my test can't
cover all cases due to resource constraints(Machine), and i'm not sure what
it will happen on other machine.

The second issue is that the e820 reserved ranges do not setup in kdump kernel,
which will cause some functions which are related to the e820 reserved ranges
to become invalid. For example:

early_memremap()->
early_memremap_pgprot_adjust()->
memremap_should_map_decrypted()->
e820__get_entry_type()

Please focus on these functions, early_memremap_pgprot_adjust() and memremap_should_map_decrypted().

In the first kernel, these ranges sit in e820 reserved ranges, so the memremap_should_map_decrypted()
will return true, that is to say, the reserved memory is decrypted, then the early_memremap_pgprot_adjust()
will call the pgprot_decrypted() to clear the memory encryption mask.

In the second kernel, because the e820 reserved ranges are not passed to the second kernel, these ranges
don't sit in the e820 reserved ranges, so the the memremap_should_map_decrypted() will return false, that
is to say, the reserved memory is encrypted, and then the early_memremap_pgprot_adjust() will also call the
pgprot_encrypted() to set the memory encryption mask.

Obviously, in the second kernel, the e820 reserved memory is still decrypted, it has gone wrong. So, if
don't fix, kdump won't work when we use the command(kexec -s -p xxx) to load the kernel image and initramfs.

Hope this helps.

Thanks,
Lianbo

pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
unsigned long size,
pgprot_t prot)
{
bool encrypted_prot;

if (!mem_encrypt_active())
return prot;

encrypted_prot = true;

//......

if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
encrypted_prot = false;

return encrypted_prot ? pgprot_encrypted(prot)
: pgprot_decrypted(prot);
}

static bool memremap_should_map_decrypted(resource_size_t phys_addr,
unsigned long size)
{
int is_pmem;

//......

/* Check if the address is outside kernel usable area */
switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
case E820_TYPE_RESERVED:
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
/* For SEV, these areas are encrypted */
if (sev_active())
break;
/* Fallthrough */

case E820_TYPE_PRAM:
return true;
default:
break;
}

return false;
}


>
>>
>> Suggested-by: Dave Young <dyoung@xxxxxxxxxx>
>> Signed-off-by: Lianbo Jiang <lijiang@xxxxxxxxxx>
>> ---
>> arch/x86/kernel/crash.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index ae724a6e0a5f..d3167125800e 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -384,6 +384,10 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
>> walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
>> memmap_entry_callback);
>>
>> + cmd.type = E820_TYPE_RESERVED;
>> + walk_iomem_res_desc(IORES_DESC_NONE, 0, 0, -1, &cmd,
>> + memmap_entry_callback);
>> +
>> /* Add crashk_low_res region */
>> if (crashk_low_res.end) {
>> ei.addr = crashk_low_res.start;
>> --
>> 2.17.1
>>