Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and-rc6-mm1: boot failure on HP nx6325, related to clockevents)

From: Thomas Gleixner
Date: Wed Sep 26 2007 - 17:35:00 EST


Rafael,

On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote:
> > > > First, with the "x86-64: Disable local APIC timer use on AMD systems with C1E"
> > > > patch and my collection of suspend patches applied, the box doesn't boot
> > > > (the suspend patches don't even thouch the boot code, so they should be
> > > > irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted
> > > > for 2.6.23-rc8) is applied in addition. Is this expected?
> > >
> > > No. That's odd. It is nothing else than adding "noapictimer" to the
> > > kernel command line.
> >
> > Seems to be reproducible, though. I'll investigate further.
>
> So far, the results are the following:
>
> 1) current Linus' tree doesn't boot with any command line (regression)
>
> [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0
>
> x86-64: Disable local APIC timer use on AMD systems with C1E
>
> It's not necessary for 2.6.23 and actually kills the box that it's supposed to fix. ]
>
> 2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems with C1E"
> patch applied behaves like the current -git
>
> 3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_

OK, this explains 2) and 3). I just looked into the code and the logic
vs. noapictimer on SMP is completely broken.

On i386 the noapictimer option not only disables the local APIC timer,
it also registers the CPUs for broadcasting via IPI on SMP systems.

The x8664 code uses the broadcast only when the local apic timer is
active, i.e. "noapictimer" is not on the command line. This defeats the
whole purpose of "noapictimer". It should be there to make boxen work,
where the local APIC timer actually has a hardware problem, e.g. the
nx6325.

The current implementation of x86_64 only fixes the ACPI c-states
related problem where the APIC timer stops in C3(2), nothing else.

On nx6325 and other AMD X2 equipped systems which have the C1E enabled
we run into the following:

PIT keeps jiffies (and the system) running, but the local APIC timer
interrupts can get out of sync due to this C1E effect.

I don't think this is a critical problem, but it is wrong nevertheless.

I think it's safe to revert the C1E patch and postpone the fix to the
clock events conversion.

> "apicmaintimer"

on your box is not going to work. See the C1E patch. "apicmaintimer"
switches off PIT and then waits for ever for the local APIC timer
interrupts.

> 4) 2.6.22 behaves like 2.6.23-rc8

No surprise

> 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
> "noapictimer"
>
> 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
> "x86-64: Disable local APIC timer use on AMD systems with C1E" patch boots
> without any extra command line options

That's consistent behaviour.

> Tested for a couple of times with each kernel, the results seem to be
> reproducible 100% of the time.

Thanks for going through this debug marathon.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/