RE: IO-APIC problem with 2.6.14-rt9

From: Pallipadi, Venkatesh
Date: Fri Nov 11 2005 - 21:25:48 EST




>-----Original Message-----
>From: linux-kernel-owner@xxxxxxxxxxxxxxx
>[mailto:linux-kernel-owner@xxxxxxxxxxxxxxx] On Behalf Of john stultz
>Sent: Thursday, November 10, 2005 1:43 PM
>To: Ingo Molnar
>Cc: dino@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Thomas Gleixner
>Subject: Re: IO-APIC problem with 2.6.14-rt9
>
>On Thu, 2005-11-10 at 22:04 +0100, Ingo Molnar wrote:
>> * john stultz <johnstul@xxxxxxxxxx> wrote:
>>
>> > > > //#define ARCH_HAS_READ_CURRENT_TIMER 1
>> > > >
>> > > > to:
>> > > >
>> > > > #define ARCH_HAS_READ_CURRENT_TIMER 1
>> > > >
>> > > > ?
>> > >
>> > > It works !! Thanks Ingo for the immediate response
>> >
>> > Hrm. Could you post the value for BogoMIPS that you're getting now?
>> >
>> > My patches touch the __delay() code, since using the TSC
>based delay
>> > has just as many, if not more, problems as the loop based
>delay. So I
>> > want to be careful that my changes are not further causing
>problems.
>> >
>> > Ingo, did you commented out ARCH_HAS_READ_CURRENT_TIMER because of
>> > problems with the new calibration code?
>>
>> yes. traces show that the new calibration code results in a bogomips
>> value on Athlon64 CPUs that halve the timeout. I.e. udelay(100) now
>> takes 50 usecs (!). The calibration code seems to assume the
>number of
>> cycles == number of loops in __delay() - that is not valid.
>
>Yea, that makes sense, because the READ_CURRENT_TIMER
>calibration is all
>TSC based and with my code we use the loop based delay (since the TSC
>based one can have a number of problems). So that doesn't mesh
>well when
>the loop/cycle values are not equivalent.
>
>That still leaves open the question why Dinakar is seeing issues w/ the
>loop based calibration, but I've got some similar hardware in
>my lab, so
>I can probably work that out.

The reason ARCH_HAS_READ_CURRENT_TIMER and related code was added is:
Due to SMIs happening during the calibration we were seeing calibration
returning
very low values with previous calibrate_delay() algorithm. With this low
value,
we don't wait long enough during the timer initialization and we expect
some
number of timer interrupts and we panic() when we don't see that many
interrupts.

All the above was with TSC based delay.
I think we can keep calibrate_delay() do the calibration using
READ_CURRENT_TIMER
(giving TSC per jiffy) and then convert it to some number of TSCs per
loop and
use it in loop based delay.

Thanks,
Venki
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/