Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpuarea

From: Jeremy Fitzhardinge
Date: Wed Jun 18 2008 - 01:35:40 EST


Mike Travis wrote:
Jeremy Fitzhardinge wrote:
Mike Travis wrote:
Ingo Molnar wrote:
* Mike Travis <travis@xxxxxxx> wrote:

* Declare the pda as a per cpu variable.

* Make the x86_64 per cpu area start at zero.

* Since the pda is now the first element of the per_cpu area,
cpu_pda()
is no longer needed and per_cpu() can be used instead. This
also makes
the _cpu_pda[] table obsolete.

* Since %gs is pointing to the pda, it will then also point to the
per cpu
variables and can be accessed thusly:

%gs:[&per_cpu_xxxx - __per_cpu_start]

Based on linux-2.6.tip
-tip testing found an instantaneous reboot crash on 64-bit x86, with
this config:

http://redhat.com/~mingo/misc/config-Thu_Jun__5_11_43_51_CEST_2008.bad

there is no boot log as the instantaneous reboot happens before
anything is printed to the (early-) serial console. I have bisected
it down to:

| 7670dc09e89a2b151a1cf49eccebc07c41c2ce9f is first bad commit
| commit 7670dc09e89a2b151a1cf49eccebc07c41c2ce9f
| Author: Mike Travis <travis@xxxxxxx>
| Date: Tue Jun 3 17:30:21 2008 -0700
|
| x86_64: Fold pda into per cpu area

the big problem is not just this crash, but that the patch is _way_
too big:

arch/x86/Kconfig | 3 +
arch/x86/kernel/head64.c | 34 ++++++--------
arch/x86/kernel/irq_64.c | 36 ++++++++-------
arch/x86/kernel/setup.c | 90
++++++++++++---------------------------
arch/x86/kernel/setup64.c | 5 --
arch/x86/kernel/smpboot.c | 51 ----------------------
arch/x86/kernel/traps_64.c | 11 +++-
arch/x86/kernel/vmlinux_64.lds.S | 1
include/asm-x86/percpu.h | 48 ++++++--------------
9 files changed, 89 insertions(+), 190 deletions(-)

considering the danger involved, this is just way too large, and
there's no reasonable debugging i can do in the bisection to narrow
it down any further.

Please resubmit with the bug fixed and with a proper splitup, the
more patches you manage to create, the better. For a dangerous code
area like this, with a track record of frequent breakages in the
past, i would not mind a "one line of code changed per patch" splitup
either. (Feel free to send a git tree link for us to try as well.)

Ingo
Thanks for the feedback Ingo. I'll test the above config and look at
splitting up the patch. The difficulty is making each patch
independently
compilable and testable.
FWIW, I'm getting past the "crashes very, very early" stage with this
series applied when booting under Xen. Then it crashes pretty early,
but that's not your fault...

J

Hi Jeremy,

Yes we have a simulator for Nahelem that also breezes past the boot up
problem (actually makes it to the kernel login prompt.) Weirdly, the
problem doesn't exist in an earlier code base so my changes are tickling
something else newly introduced. I'm attempting to see if I can use
GRUB 2 with the GDB stubs to track it down (which is time consuming in
itself to setup.)

It is definitely related to basing percpu variable offsets from %gs and
(I think) interrupts.

Hi Mike,

Have you made any progress on this? I'm bumping up against it when I run on native hardware (as opposed to under Xen).

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/