Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area

From: Eric W. Biederman
Date: Wed Jul 09 2008 - 18:42:16 EST


Mike Travis <travis@xxxxxxx> writes:

> Very cool, thanks!!! I will start using this. (I have been using the trick
> to replace printk with early_printk so messages come out immediately instead
> of from the log buf.)

Just passing early_printk=xxx on the command line should have that effect.
Although I do admit you have to be a little bit into the boot before early_printk
is setup.

> I've been able to make some more progress. I've gotten to a point where it
> panics from stack overflow. I've verified this by bumping THREAD_ORDER and
> it boots fine. Now tracking down stack usages. (I have found a couple of new
> functions using set_cpus_allowed(..., CPU_MASK_ALL) instead of
> set_cpus_allowed_ptr(... , CPU_MASK_ALL_PTR). But these are not in the calling
> sequence so subsequently are not the cause.

Is stack overflow the only problem you are seeing or are there still other mysteries?

> One weird thing is early_idt_handler seems to have been called and that's one
> thing our simulator does not mimic for standard Intel FSB systems - early
> pending
> interrupts. (It's designed after all to mimic our h/w, and of course it's been
> booting fine under that environment.)

That usually indicates you are taking an exception during boot not that you
have received an external interrupt. Something like a page fault or a
division by 0 error.

> Only a few of these though I would think might get called early in
> the boot, that might also be contributing to the stack overflow.

Still the call chain depth shouldn't really be changing. So why should it
matter? Ah. The high cpu count is growing cpumask_t so when you put
it on the stack. That makes sense. So what stars out as a 4 byte
variable on the stack in a normal setup winds up being a 1k variable
with 4k cpus.

> Oh yeah, I looked very closely at the differences in the assembler
> for vmlinux when compiled with 4.2.0 (fails) and 4.2.4 (which boots
> with the above mentioned THREAD_ORDER change) and except for some
> weirdness around ident_complete it seems to be the same code. But
> the per_cpu variables are in a completely different address order.
> I wouldn't think that the -j10 for make could cause this but I can
> verify that with -j1. But in any case, I'm sticking with 4.2.4 for
> now.

Reasonable. The practical problem is you are mixing a lot of changes
simultaneously and it confuses things. Compiling with NR_CPUS=4096
and working out the bugs from a growing cpumask_t, putting the per cpu
area in a zero based segment, and putting putting the pda into the
per cpu area all at the same time.

Who knows maybe the only difference between 4.2.0 and 4.2.4 is that
4.2.4 optimizes it's stack usage a little better and you don't see
a stack overflow.

It would be very very good if we could separate out these issues
especially the segment for the per cpu variables. We need something
like that.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/