Re: REGRESSION: the new i386 timer code fails to sync CPUs
From: Andrew Morton
Date: Sun Jul 30 2006 - 05:08:45 EST
On Mon, 24 Jul 2006 13:54:03 -0700
john stultz <johnstul@xxxxxxxxxx> wrote:
> On Mon, 2006-07-24 at 19:51 +0200, Andi Kleen wrote:
> > On Mon, Jul 24, 2006 at 07:17:11PM +0200, Matthias Urlichs wrote:
> > > You can probably assume that they're synced on systems with no more
> > > than one dual-core / hyperthreaded CPU.
> > >
> > > My system obviously has two of those.
> >
> > According to Intel on all of their chipsets/motherboard reference
> > designs all the sockets run from a single clock crystal.
> >
> > I've confirmed this for a long time on 64bit and even to some
> > extent on 32bit on distro kernels.
>
> Andi: Which 32bit distro patch are you referring to here?
>
> > Maybe you got a broken BIOS or similar though.
> >
> > > > Matthias: "clock=pmtmr" is probably the best workaround in the short
> > > > term. Could you send me your dmesg and dmidecode output? We'll try to
> > > > find something to key off of so it will mark the tsc as unstable by
> > > > default on your system.
> > > >
> > > I'd assume that finding (and, possibly, being unable to correct) TSC skew
> >
> > The BIOS normally guarantee it at boot. However maybe you got a broken one.
> >
> > We used to do TSC sync correction at boot on Intel, but stopped doing
> > that when we found out that the TSC sync code adds an error
> > To an already perfectly synchronized system.
> >
> > Actually I think i386 still does it, just x86-64 stopped
>
> Indeed i386 still does it. I knew x86-64 had a new ia64 inspired
> algorithm, but I didn't realize that they didn't even try to call it in
> most cases.
>
> The (untested) patch below will disable it on i386. Matthias, mind
> trying it out to see if the TSC sync code is causing the problem?
>
>
> > My first assumption would be that you hit a bug somewhere in the new
> > clock code. What happens when you boot an older kernel (like 2.6.17)
> > with clock=tsc ?
>
> Yes, that would be good to confirm the issue. :)
>
> thanks
> -john
>
>
> Hack out the i386 TSC sync code.
>
>
> diff --git a/arch/i386/kernel/smpboot.c b/arch/i386/kernel/smpboot.c
> index 6f5fea0..cd28914 100644
> --- a/arch/i386/kernel/smpboot.c
> +++ b/arch/i386/kernel/smpboot.c
> @@ -435,7 +435,7 @@ static void __devinit smp_callin(void)
> /*
> * Synchronize the TSC with the BP
> */
> - if (cpu_has_tsc && cpu_khz && !tsc_sync_disabled)
> + if (0 && cpu_has_tsc && cpu_khz && !tsc_sync_disabled)
> synchronize_tsc_ap();
> }
>
> @@ -1305,7 +1305,7 @@ static void __init smp_boot_cpus(unsigne
> /*
> * Synchronize the TSC with the AP
> */
> - if (cpu_has_tsc && cpucount && cpu_khz)
> + if (0 && cpu_has_tsc && cpucount && cpu_khz)
> synchronize_tsc_bp();
> }
I guess Matthias didn't test this patch. Can we get some obviously-correct
fix in place for 2.6.18?
Also, I was rather hoping we'd be able to work out why write_tsc() isn't
working on this CPU. If that's fixable, that would be the best fix for
this bug, no?
It is a "CPU0: Intel(R) Xeon(TM) CPU 3.00GHz stepping 03".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/