Re: [RFC PATCH V2 0/1] x86: cpu topology fix and question on x86_max_cores

From: Peter Zijlstra
Date: Mon Feb 20 2023 - 05:37:32 EST



On Mon, Feb 20, 2023 at 11:28:55AM +0800, Zhang Rui wrote:

> Solution for fix smp_num_sibling
> --------------------------------
>
> Patch 1/1 ensures that smp_num_siblings represents the system-wide maximum
> number of siblings by always increasing its value. Never allow it to
> decrease.
>
> It is sufficient to make the problem go away.
>
> However, there is a pontenial problem left. That is, when boot CPU is an
> Ecore CPU, smp_num_sibling is set to 1 during BSP probe, kernel disables
> SMT support by setting cpu_smt_control to CPU_SMT_NOT_SUPPORTED in
> start_kernel()->check_bugs()->cpu_smt_check_topology().
> So far, we don't have such platforms.

This is the much recurring problem of the boot CPU not having access to
the system topology.

Instead of fixing that, Intel seems to work at making it worse. At some
point, we're just going to have to give up and move to DT or something
:/

Please communicate (again), that only knowing the topology/setup of the
system once all the CPUs are online is crap. Once you start bringing up
APs some things are fixed -- if we guessed wrong, we're hosed.

Specific examples of this that we've ran into in the past are:

- does the machine have SMT
- is the machine Hybrid
(and if so, how many different core types will be have)

Specifically, things like determining the number of GP event counters on
a PMU sometimes depends on HT being active, but we want the PMU
initialized really early because it also serves watchdog and you want
splats when something goes side-ways.

The end result is that we have to make things complicated and
dynamically re-adjust when system resources come online.

So far we've managed -- just, but *PLEASE*, dont make it worse!!!