Re: [Patch v2 2/2] x86/tsc: use logical_packages as a better estimation of socket numbers

From: Zhang, Rui
Date: Fri Jun 16 2023 - 05:19:33 EST


On Fri, 2023-06-16 at 10:10 +0200, Peter Zijlstra wrote:
> On Fri, Jun 16, 2023 at 10:02:31AM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 16, 2023 at 06:53:21AM +0000, Zhang, Rui wrote:
> > > On Thu, 2023-06-15 at 11:20 +0200, Peter Zijlstra wrote:
> >
> > > > So I have at least two machines where I boot with
> > > > 'possible_cpus=#'
> > > > because the BIOS MADT is reporting a stupid number of CPUs that
> > > > aren't
> > > > actually there.
> > >
> > > Does the MADT report those CPUs as disabled but online capable?
> > > can you send me a copy of the acpidmp?
> >
> > Sent privately, it's a bit big.
>
> So if I remove 'possible_cpus=40' it does crazy shit like this:
>
> [    1.268447] setup_percpu: NR_CPUS:512 nr_cpumask_bits:160
> nr_cpu_ids:160 nr_node_ids:2
>
> [    1.303567] pcpu-alloc: [0] 000 001 002 003 004 005 006 007
> [    1.309871] pcpu-alloc: [0] 008 009 020 021 022 023 024 025
> [    1.316172] pcpu-alloc: [0] 026 027 028 029 040 042 044 046
> [    1.322475] pcpu-alloc: [0] 048 050 052 054 056 058 060 062
> [    1.328777] pcpu-alloc: [0] 064 066 068 070 072 074 076 078
> [    1.335084] pcpu-alloc: [0] 080 082 084 086 088 090 092 094
> [    1.341387] pcpu-alloc: [0] 096 098 100 102 104 106 108 110
> [    1.347688] pcpu-alloc: [0] 112 114 116 118 120 122 124 126
> [    1.353992] pcpu-alloc: [0] 128 130 132 134 136 138 140 142
> [    1.360293] pcpu-alloc: [0] 144 146 148 150 152 154 156 158
> [    1.366596] pcpu-alloc: [1] 010 011 012 013 014 015 016 017
> [    1.372900] pcpu-alloc: [1] 018 019 030 031 032 033 034 035
> [    1.379201] pcpu-alloc: [1] 036 037 038 039 041 043 045 047
> [    1.385504] pcpu-alloc: [1] 049 051 053 055 057 059 061 063
> [    1.391806] pcpu-alloc: [1] 065 067 069 071 073 075 077 079
> [    1.398109] pcpu-alloc: [1] 081 083 085 087 089 091 093 095
> [    1.404411] pcpu-alloc: [1] 097 099 101 103 105 107 109 111
> [    1.410714] pcpu-alloc: [1] 113 115 117 119 121 123 125 127
> [    1.417016] pcpu-alloc: [1] 129 131 133 135 137 139 141 143
> [    1.423319] pcpu-alloc: [1] 145 147 149 151 153 155 157 159
>
> [    2.110382] smp: Bringing up secondary CPUs ...
> [    2.112255] x86: Booting SMP configuration:
> [    2.113253] .... node  #0, CPUs:          #1   #2   #3   #4   #5  
> #6   #7   #8   #9
> [    2.221253] .... node  #1, CPUs:    #10
> [    0.163522] smpboot: CPU 10 Converting physical 0 to logical die 1
> [    2.337372]   #11  #12  #13  #14  #15  #16  #17  #18  #19
> [    2.504253] .... node  #0, CPUs:    #20  #21  #22  #23  #24  #25 
> #26  #27  #28  #29
> [    2.563253] .... node  #1, CPUs:    #30  #31  #32  #33  #34  #35 
> #36  #37  #38  #39
> [    2.662321] smp: Brought up 2 nodes, 40 CPUs
> [    2.664257] smpboot: Max logical packages: 8
>
> It is an IVB-EP with *2* sockets, 10 cores and SMT, 40 is right, 160
> is
> quite insane.

According to the MADT, there are indeed 40 valid CPUs. And then 80 CPUs
with

APIC ID : FF
enabled : 0
Online capable : 0

a dumb question, why are these CPUs added into the possible_mask?
I can dig into this later but I just don't have a quick answer at the
moment.

thanks,
rui