Re: [PATCH] x86/tsc: Use topology_max_packages() to get package number

From: Dave Hansen
Date: Mon Mar 18 2024 - 12:36:08 EST


On 3/15/24 04:26, Feng Tang wrote:
> Thomas' recent patchset of refactoring x86 topology code introduces
> topology_max_package(), which works well in most of the above cases.
> The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which
> sets up the 'nr_cpu_ids' and rejects the rest of the CPUs, and may
> cause topology_max_package() less than the real package number, but
> it's fine as it is rarely used debug option, and logical package
> number really matters in this check. So use the more accurate
> topology_max_package() to replace nr_online_nodes().

This is a lot longer than what you have in this changelog, but I hope it
makes sense. This is retreading some of what Thomas's set does, but I
think it's worth re-explaining:

The boot CPU iterates through all enumerated APIC ids and either accepts
or rejects the APIC id. For accepted ids, it figures out which bits of
the id map to the package number. It tracks which package numbers have
been seen in a bitmap. topology_max_package() just returns the number
of bits set in that bitmap.

'nr_cpus=' and 'possible_cpus=' can cause more APIC ids to be rejected
and can artificially lower the number of bits in the package bitmap and
thus topology_max_package().

This means that, for example, a system with 8 physical packages might
reject all the CPUs on 6 of those packages and be left with only 2
packages and 2 bits set in the package bitmap. It needs the TSC
watchdog, but would disable it anyway. This isn't ideal, but this only
happens for debug-oriented options.

This is fixable by tracking the package numbers for rejected CPUs. But
it's not worth the trouble for debugging.

--

The only thing that comes to mind is if we need something simple like:

if (topo_info.nr_rejected_cpus)
pr_info("TSC might be buggered\n");

.. somewhere.