Re: Linux 2.6.29-rc6

From: Ingo Molnar
Date: Tue Mar 17 2009 - 12:41:29 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Tue, 17 Mar 2009, Ingo Molnar wrote:
> >
> > That's the idea of my patch: to use not two endpoints but thousands
> > of measurement points.
>
> Umm. Except you don't.
>
> > By measuring more we can get a more precise result, and we also do
> > not assume anything about how much time passes between two
> > measurement points.
>
> That's fine, but your actual code doesn't _do_ that.
>
> > My 'delta' algorithm does not assume anything about how much time
> > passes between two measurement points - it calculates the slope and
> > keeps a rolling average of that slope.
>
> No, you keep a very bad measure of "some kind of random average of the
> last few points", which - if I read things right:
>
> - lacks precision (you really need to use 'double' floating point to do
> it well, otherwise the rounding errors will kill you). You seem to be
> aiming for a 10-bit fixed point thing, which may or may not work if
> done cleverly, but:
>
> - seems to be based on a rather weak averaging function which certainly
> will lose data over time.
>
> The thing is, the only _accurate_ average is the one done over
> long time distances. It's very true that your slope thing works
> very well over such long times, and you'd get accurate measurement
> if you did it that way, BUT THAT IS NOT WHAT YOU DO. You have a
> very tight loop, so you get very bad slopes, and then you use a
> weak averaging function to try to make them better, but it never
> does.

Hm, the intention there was to have a memory of ~1000 entries via a
decaying average of 1:1000.

In parallel to that there's also a noise estimator (which too decays
over time). So basically when observed noise is very low we
essentially use the data from the last ~1000 measurements. (well,
not exactly - as the 'memory' of more recent data will be stronger
than that of older ones.)

Again ... it's a clearly non-working patch so it's not really a
defendable concept :-)

> Also, there seems to be a fundamental bug in your PIT reading
> routine. My fast-TSC calibration only looks at the MSB of the PIT
> read for a very good reason: if you don't use the explicit LATCH
> command, you may be getting the MSB of one counter value, and then
> the LSB of another. So your PIT read can easily be off by ~256 PIT
> cycles. Only by caring only for the MSB can you do an unlatched
> read!
>
> That is why pit_expect_msb() looks for the "edge" where the MSB
> changes, and never actually looks at the LSB.
>
> This issue may be an additional reason for your problems, although
> maybe your noise correction will be able to avoid those cases.

indeed. I did check the trace results though via gnuplot yesterday
(suspectig PIT readout outliers) and there were no outliers.

For any final patch it's still a showstopper issue.

But the source of error and miscalibration is elsewhere.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/