Re: Linux 2.6.24-rc5 x86 architecture no longer Oopses...

From: Ingo Molnar
Date: Thu Dec 20 2007 - 18:50:49 EST



* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> Yes - I don't know why the smp_processor_id() test has suddenly
> started triggering in there.

it's a "must not happen".

here:

> __raw_spin_lock(&die.lock);
> raw_local_save_flags(flags);
> - die.lock_owner = smp_processor_id();
> + die.lock_owner = raw_smp_processor_id();

we just disabled irqs with raw_local_save_flags().


here:

> mem_parity_error(unsigned char reason, struct pt_regs * regs)
> {
> printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on "
> - "CPU %d.\n", reason, smp_processor_id());
> + "CPU %d.\n", reason, raw_smp_processor_id());
> printk(KERN_EMERG "You have some hardware problem, likely on the PCI bus.\n");

we are straight into an NMI which has hardirqs disabled.


> printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on "
> - "CPU %d.\n", reason, smp_processor_id());
> + "CPU %d.\n", reason, raw_smp_processor_id());

ditto.

> @@ -708,7 +708,7 @@ void __kprobes die_nmi(struct pt_regs *r
> bust_spinlocks(1);
> printk(KERN_EMERG "%s", msg);
> printk(" on CPU%d, ip %08lx, registers:\n",
> - smp_processor_id(), regs->ip);
> + raw_smp_processor_id(), regs->ip);

same.

it needs to be found out why the preempt_count suddenly went to zero. Is
task struct corruption out of question?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/