Re: [PATCH] irq/timings: Fix model validity

From: Peter Zijlstra
Date: Wed Nov 07 2018 - 08:05:28 EST


On Wed, Nov 07, 2018 at 11:52:31AM +0100, Daniel Lezcano wrote:
> > @@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
> > */
> > diff = interval - irqs->avg;
> >
> > + /*
> > + * Online average algorithm:
> > + *
> > + * new_average = average + ((value - average) / count)
> > + *
> > + * The variance computation depends on the new average
> > + * to be computed here first.
> > + *
> > + */
> > + irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
> > +
> > + /*
> > + * Online variance algorithm:
> > + *
> > + * new_variance = variance + (value - average) x (value - new_average)
> > + *
> > + * Warning: irqs->avg is updated with the line above, hence
> > + * 'interval - irqs->avg' is no longer equal to 'diff'
> > + */
> > + irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
> > +
> > /*
> > * Increment the number of samples.
> > */
> > irqs->nr_samples++;

FWIW, I'm confused on this. The normal (Welford's) online algorithm
does:

count++;
delta = value - mean;
mean += delta / count;
M2 += delta * (value - mean);

But the above uses:

mean += delta / 32;

Which, for count >> 32, over-estimates the mean adjustment. But worse,
it significantly under-estimates the mean during training.

How is the computed variance still correct with this? I can not find any
comments that clarifies this. I'm thinking that since the mean will
slowly wander towards it's actual location (assuming an actual standard
distribution input) the resulting variance will be far too large, since
the (value - mean) term will be much larger than 'expected'.

> > @@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
> > * more than 32 and dividing by 32 instead of 31 is enough
> > * precise.
> > */
> > + variance = irqs->variance >> IRQ_TIMINGS_SHIFT;

Worse; variance is actually (as the comment states):

s^2 = M2 / (count -1)

But instead you compute:

s^2 = M2 / 32;

Which is again much larger than the actual result; assuming count >> 32.

So you compute a variance that is inflated in two different ways.


I'm not seeing how this thing works reliably.