Re: Realtime Preemption, 2.6.12, Beginners Guide?

From: Ingo Molnar
Date: Thu Jul 07 2005 - 10:00:46 EST



* Alistair John Strachan <s0348365@xxxxxxxxxxxx> wrote:

> http://devzero.co.uk/~alistair/oops1.jpeg
>
> I disabled the trace and the STACKOVERFLOW option seems to help; I've
> got a (slightly truncated) oops from the kernel. What happens is that
> I get an oops, then I get a BUG: warning me about the softlock, then I
> get another oops. I'm about to reboot to confirm whether the second
> oops is identical to the first (I suspect that it is).

unfortunately the EIP is at 0xedc, which is a corrupted value. The stack
trace portion that is visible on the screen is the usual pagefault trace
- without any information about the crash site itself. What the oops
tells us is that it's the openvpn process that crashed (if this was the
first oops). The preempt_count is 0x20010004, which shows us that this
was a section that had soft-IRQ-flags disabled and was in a hardirq
context. (see the meaning of the preempt bits at the top of
include/linux/hardirq.h) That it's a hardirq handler that crashed is
further corroborated by the esp, which points into a kernel data area
(hardirq_ctx[], the 4K irq stacks), not into the process's kernel stack
(which is at threadinfo).

the stack pointer itself looks healthy, it's near the end of a 4K page,
i.e. far from overflowing. So it would be really useful to get the full
oops output. (that way you can also be sure it's the first crash you are
seeing.)

(i doubt netconsole debugging will work, given that we are in a hardirq
context. Serial logging will work.)

one thing you could try is to apply the attached patch and reproduce the
crash. It should print a pure backtrace and lock the box up afterwards,
so that you can take a picture.

Ingo

--- linux/arch/i386/kernel/traps.c.orig
+++ linux/arch/i386/kernel/traps.c
@@ -323,6 +323,8 @@ void die(const char * str, struct pt_reg

if (++die.lock_owner_depth < 3) {
int nl = 0;
+ dump_stack();
+ for (;;) raw_local_irq_disable(); // freeze the box
handle_BUG(regs);
printk(KERN_ALERT "%s: %04lx [#%d]\n", str, err & 0xffff, ++die_counter);
#ifdef CONFIG_PREEMPT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/