Re: [RFC][PATCH] x86/smp: Fix __max_logical_packages value setup

From: Peter Zijlstra
Date: Wed Aug 10 2016 - 15:13:24 EST


On Wed, Aug 03, 2016 at 06:23:58PM +0200, Jiri Olsa wrote:
> Frank reported kernel panic when he disabled several cores in BIOS
> via following option:
>
> Core Disable Bitmap(Hex) [0]
>
> with number 0xFFE, which leaves 16 CPUs in system (out of 48).

That seems like a daft BIOS option. How wide spread is that? I can't
remember ever seeing that.

> The reason for the panic is wrong value of __max_logical_packages,
> which lets logical_package_map uninitialized and the uncore code
> relying on this map being properly initialized (maybe we should
> add some safety checks there as well).
>
> The __max_logical_packages is computed as:
>
> DIV_ROUND_UP(total_cpus, ncpus);
> - ncpus being number of cores
>
> With above BIOS setup we get total_cpus == 16 which set
> __max_logical_packages to 2 (ncpus is 12).
>
> Once topology_update_package_map processes CPU with logical
> pkg over 2 we display above messages and fail to initialize
> the physical_to_logical_pkg map, which makes the uncore code
> crash.
>
> The fix is to set __max_logical_packages directly to total_cpus,
> which should be the maximum possible logical ID of the pkg in
> any case.
>
> Reported-by: Frank Ramsay <framsay@xxxxxxxxxx>
> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 157bf0957219..484f7d357c77 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -340,7 +340,7 @@ static void __init smp_init_package_map(void)
> ncpus = 1;
> }
>
> - __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
> + __max_logical_packages = total_cpus;

This seems undesirable.. it would grow the bitmap unnecessarily big on
most setups.

Is there no way to detect the brain damage inflicted by that BIOS option
and fudge ncpus in that case?