Re: linux-next: Tree for June 13: IO APIC breakage on HP nx6325

From: Rafael J. Wysocki
Date: Tue Jun 17 2008 - 16:59:52 EST


On Tuesday, 17 of June 2008, Rafael J. Wysocki wrote:
> On Monday, 16 of June 2008, Maciej W. Rozycki wrote:
> > On Mon, 16 Jun 2008, Rafael J. Wysocki wrote:
> >
> > > > > commit 7e3530cd98a0c6ab38f5898e855a5beffab26561
> > > > > Author: Maciej W. Rozycki <macro@xxxxxxxxxxxxxx>
> > > > > Date: Tue May 27 21:19:51 2008 +0100
> > > > >
> > > > > x86: I/O APIC: timer through 8259A second-chance
> > > > >
> > > > > Signed-off-by: Maciej W. Rozycki <macro@xxxxxxxxxxxxxx>
> > > > > Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> > > >
> > > > Can I have .config used and a full bootstrap log from that system with
> > > > the patch still applied?
> > >
> > > That may be difficult, because with the patch applied the box either doesn't
> > > boot at all, or works unreliably when booted (depending on the set of patches
> > > applied on top of it).
> >
> > Serial console?
>
> No, this box doesn't have any serial ports. It has a FireWire one, but I don't
> have a matching cable ...
>
> > I'm most interested in one from a configuration that
> > does not boot at all as that's easier to reproduce, determine the cause
> > and verify whether a change fixes the problem or not. Other
> > configurations may then be tested with the fix in place.
>
> With the -next from today (20080616) I get a different picture.
>
> Without any patches on top it boots, but the fan is turned 100% on as soon as
> the ACPI modules get loaded, regardless of the temperature (normally it does
> that above 75^o C, which is impossible to get normally, because there are 3
> temperature trip points below that level; generally the hardware only does that
> when overheating). After that, things start to go _very_ slow, like 10x slower
> than usually in X and somewhat slower in the fb console, but I was able to get
> a dmesg output. This is reproducible 100% of the time.
>
> With commit 7e3530cd98a0c6ab38f5898e855a5beffab26561 reverted the box seems to
> work normally.

To debug this problem a bit more, I applied the following change:

--- linux-next.orig/arch/x86/kernel/io_apic_64.c
+++ linux-next/arch/x86/kernel/io_apic_64.c
@@ -1667,7 +1667,7 @@ static inline void __init check_timer(vo
pin2 = ioapic_i8259.pin;
apic2 = ioapic_i8259.apic;

- apic_printk(APIC_VERBOSE,KERN_INFO "..TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
+ printk(KERN_CRIT "TIMER: vector=0x%02X apic1=%d pin1=%d apic2=%d pin2=%d\n",
cfg->vector, apic1, pin1, apic2, pin2);

if (pin1 != -1) {

and found that apic1=0, pin1=2, apic2=-1, pin2=-1. Moreover, the
(!no_timer_check && timer_irq_works()) test evidently fails, so the timer
cannot be connected to apic1, but the patch forcibly ignores that, which in
turn, on this particular box, confuses the heck out of the northbridge.

May I gently ask that the patch ("x86: I/O APIC: timer through 8259A second-chance")
be reverted?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/