Re: Early boot regression from f0551af0213 ("x86/topology: Ignore non-present APIC IDs in a present package")

From: Mario Limonciello
Date: Wed May 08 2024 - 18:09:46 EST


On 5/8/2024 16:47, Thomas Gleixner wrote:
Mario!

On Thu, May 02 2024 at 05:33, Mario Limonciello wrote:
On 4/25/2024 16:42, Thomas Gleixner wrote:
Right, that's what we saw with the debug patch. The ACPI/MADT table
is clearly bonkers. The effect of it is that it pretends that the system
has 16 possible CPUs:

[ 0.089381] CPU topo: Allowing 8 present CPUs plus 8 hotplug CPUs

Which in turn changes the sizing of the per CPU data and affects some
other details which depend on the number of possible CPUs.

At least this aspect of this I suspect is caused by commit
fed8d8773b8ea68ad99d9eee8c8343bef9da2c2c.

If you try reverting that I expect the "hotplug CPUs" disappear.

That does not solve anything.

The topology core already rejects those CPUs and accounts only for 8,
which in turn causes the boot to fail as also demonstrated by limiting
the number of possible CPUs to 8.

There is some other problem with this broken BIOS/ACPI.

Something very commonly done in BIOSes on AMD systems is that the FADT has "entries" for the maximum number of CPUs that can be present. For example if the system can support up to 12 cores and you buy an 8 core vs 12 core the BIOS will have the same number of entries (probably 24 considering SMT) either way. In the case of 8 cores only 16 would end up populated.

Looking at Lyude's logs that system is from before ACPI 6.3 was even introduced so that's why I was suggesting that reverting that commit might help at least the kernel claiming that it saw a number of hotplug CPUs.

But yes, I agree it probably won't help the overall issue that started this thread.