Re: [PATCH 1/2 v6] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name

From: lijiang
Date: Thu Nov 15 2018 - 22:28:51 EST


å 2018å11æ15æ 18:39, Borislav Petkov åé:
> + Bjorn.
>
> On Thu, Nov 15, 2018 at 01:44:07PM +0800, lijiang wrote:
>> At present, the upstream kernel does not pass the e820 reserved ranges to the
>> second kernel, which might cause two problems:
>>
>> The first one is the MMCONFIG issue, the PCI MMCONFIG(extended mode) requires
>> the reserved region otherwise it falls back to legacy mode, which might lead to
>> the hot-plug device could not be recognized in kdump kernel.
>
> Well, this still doesn't explain it fully. Let's look at a box:
>
> [ 0.000000] e820: BIOS-provided physical RAM map:
> [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x00000000000997ff] usable
> [ 0.000000] BIOS-e820: [mem 0x0000000000099800-0x000000000009ffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000065642fff] usable
> [ 0.000000] BIOS-e820: [mem 0x0000000065643000-0x0000000067fb8fff] reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000067fb9000-0x00000000689e8fff] ACPI NVS
> [ 0.000000] BIOS-e820: [mem 0x00000000689e9000-0x0000000068bf5fff] ACPI data
> [ 0.000000] BIOS-e820: [mem 0x0000000068bf6000-0x000000006f7fffff] usable
> [ 0.000000] BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000fec80000-0x00000000fed00fff] reserved
> [ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000001007fffff] reserved
> [ 0.000000] BIOS-e820: [mem 0x0000000100800000-0x000000603fffffff] usable
>
> this one has 8 reserved regions. Does that mean that we need to pass
> them *all* 8 to the second kernel so that MMCONFIG works?
>
> Or is it only one reserved region which is needed for MMCONFIG?
>

On my machine, the pci mmconfig region[mem 0x80000000-0x8fffffff] reserved in e820.
This address range belongs to the e820 reserved region[mem 0x0000000078000000-
0x000000008fffffff].

Kernel outputs the following log:

[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000008bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000008c000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000029920fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000029921000-0x0000000029921fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000029922000-0x0000000062278fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000062279000-0x0000000062378fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000062379000-0x000000006238bfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000006238c000-0x000000006238cfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x000000006238d000-0x000000006240bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000006240c000-0x000000006264bfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000006264c000-0x000000006266dfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000006266e000-0x00000000626cdfff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000626ce000-0x000000006278dfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000006278e000-0x000000006278efff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000006278f000-0x0000000062807fff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000062808000-0x000000006280afff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000006280b000-0x000000006280cfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000006280d000-0x000000006280dfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x000000006280e000-0x000000006286afff] usable
[ 0.000000] BIOS-e820: [mem 0x000000006286b000-0x000000006286efff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000006286f000-0x00000000682f8fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000682f9000-0x0000000068b05fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000068b06000-0x0000000068b09fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000068b0a000-0x0000000068b1afff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000068b1b000-0x0000000068b1dfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000068b1e000-0x0000000071d1dfff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000071d1e000-0x0000000071d2dfff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000071d2e000-0x0000000071d3dfff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x0000000071d3e000-0x0000000071d4dfff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x0000000071d4e000-0x0000000077ffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000078000000-0x000000008fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed80fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000087effffff] usable
[ 0.000000] BIOS-e820: [mem 0x000000087f000000-0x000000087fffffff] reserved

......
[ 0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[ 0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
......


For the pci mmconfig issue, it should be good enough that the e820 reserved region
[mem 0x0000000078000000-0x000000008fffffff] is only passed to the second kernel, but
the pci mmconfig region is not the same in another machine.

In addition, it has more serious problems that kdump could not work in some machine.

> Bjorn, do you know what the detection logic should be to map the correct
> reserved region (or regions) for MMCONFIG?
>
> Now, even if we don't map that reserved region and MMCONFIG falls back
> to legacy mode, why is that a problem for the kdump kernel? Why does
> the kdump kernel need the hotplug device? What would be the use case?
> Hotplug a SATA drive to store the memory dump to it ... or?
>

A simple case, hotplug a pci network card and use the ssh/nfs to dump the vmcore.
If the pci mmconfig region is not reserved in kdump kernel, the pci hotplug device
could not be recognized. So the pci network card won't work.

>> Another one is that the e820 reserved ranges do not setup in kdump kernel, which
>> could cause kdump can't work in some machines. To know more information, please
>> refer to the [PATCH 2/2 v6] patch log.
>
> Yah, I still don't understand *why* we need the reserved ranges in the
> second kernel. Once we've figured out the *why* we can look at the *how*.
>

Here, there is an example about SME kdump. Maybe it can help to better understand.

The e820 reserved ranges do not setup in kdump kernel, which will cause some
functions that related to the e820 reserved ranges to become invalid.

early_memremap()->
early_memremap_pgprot_adjust()->
memremap_should_map_decrypted()->
e820__get_entry_type()

Please focus on these functions, early_memremap_pgprot_adjust() and
memremap_should_map_decrypted().

In the first kernel, these ranges sit in e820 reserved ranges, so the
memremap_should_map_decrypted() will return true, that is to say, the
reserved memory is decrypted, then the early_memremap_pgprot_adjust()
will call the pgprot_decrypted() to clear the memory encryption mask.

In the second kernel, because the e820 reserved ranges are not passed
to the second kernel, these ranges don't sit in the e820 reserved ranges,
so the memremap_should_map_decrypted() will return false, that is to say,
the reserved memory is encrypted, and then the early_memremap_pgprot_
adjust() will also call the pgprot_encrypted() to set the memory encryption
mask.

In fact, in the second kernel, the e820 reserved memory is still decrypted.
Obviously, it has gone wrong. So, this issue must be fixed, otherwise kdump
won't work in this case.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Hope this is helpful.

Thanks,
Lianbo

> Thx.
>