Re: [patch] x86: 2.6.31-rc7 crash due to buggy flat_phys_pkg_id

From: Cyrill Gorcunov
Date: Tue Aug 25 2009 - 14:32:12 EST


[Ingo Molnar - Tue, Aug 25, 2009 at 08:15:00PM +0200]
|
| * Ravikiran G Thirumalai <kiran@xxxxxxxxxxxx> wrote:
|
| > On Mon, Aug 24, 2009 at 10:12:01PM -0700, Yinghai Lu wrote:
| > >Ravikiran G Thirumalai wrote:
| > >> On Mon, Aug 24, 2009 at 04:53:45PM -0700, Yinghai Lu wrote:
| > >>> Ravikiran G Thirumalai wrote:
| > >>>> Signed-off-by: Ravikiran Thirumalai <kiran@xxxxxxxxxxxx>
| > >>>> Cc: Yinghai Lu <yinghai@xxxxxxxxxx>
| > >>>>
| > >>>
| > >>
| > >> Why? The specs seem to indicate otherwise unless I am mistaken --
| > >> Intel systems programming guide, Vol 3A Part1, chapter 7 section
| > >> 7.5.5 - Identifying Logical Processors in a MP system:
| > >> <quote>
| > >> After the BIOS has completed the MP initialization protocol, each logical
| > >> processor can be uniquely identified by its local APIC ID. Software can
| > >> access these APIC IDs in either of the following ways
| > >> </quote>
| > >> phys_pkg_id() indicates that the logical package id is being looked up,
| > >> so local apic id should be used here no?
| > >> What am I missing?
| > >
| > >initial apic id : it can not changed, there is fixed mapping from that to physical processor id aka socket id / node id.
| > >
| > >apic id: could be changed by BIOS to any value. there is no good way to get phys_pkg_id from that.
| > >
| >
| > But BIOS is supposed to change it to a sane value. Until 2.6.30,
| > local apic id has been used to get phys_pkg_id for the 'flat'
| > apics! What changed? Was this changed for a BIOS bug? Even the
| > intel books seem to indicate local apic usage!
|
| We should revert to the .30 behavior unless there's a good reason
| (even in that case we'll solve the regression and do a workaround
| for vSMP). Yinghai?
|
| Ingo

I'm definitely not APIC expert but since I was partially involved
letme turn in.

Original commit which causes problem for vSMP seems to be due
to cpu_has_apic bit turned off (ie due to being manually disabled
or acpi table broken) so further read apic id will return plain
zero (we're talking about 64 bits now). So frnakly I don't understand
what is wrong with Ravikiran's patch. In case of apic disabled
initial apic value will be used anyway (which is latched but
actually may be changed, but it's not our case).

Or perhaps there is an issue in srat numa nodes numbering?

-- Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/