Re: [PATCH] irq/timings: Fix model validity

From: Daniel Lezcano
Date: Wed Nov 07 2018 - 05:52:40 EST


On 07/11/2018 10:46, Peter Zijlstra wrote:
> On Wed, Nov 07, 2018 at 09:59:36AM +0100, Peter Zijlstra wrote:
>> On Wed, Nov 07, 2018 at 12:39:31AM +0100, Rafael J. Wysocki wrote:
>
>>> In general, however, I need to be convinced that interrupts that
>>> didn't wake up the CPU from idle are relevant for next wakeup
>>> prediction. I see that this may be the case, but to what extent is
>>> rather unclear to me and it looks like calling
>>> irq_timings_next_event() would add considerable overhead.
>>
>> How about we add a (debug) knob so that people can play with it for now?
>> If it turns out to be useful, we'll learn.
>
> That said; Daniel, I think there is a problem with how irqs_update()
> sets irqs->valid. We seem to set valid even when we're still training.

Yes, the fix seems right.

Thanks for fixing it.

-- Daniel

> ---
> Subject: irq/timings: Fix model validity
>
> The per IRQ timing predictor will produce a 'valid' prediction even if
> the model is still training. This should not happen.
>
> Fix this by moving the actual training (online stddev algorithm) up a
> bit and returning early (before predicting) when we've not yet reached
> the sample threshold.
>
> A direct concequence is that the predictor will only ever run with at
> least that many samples, which means we can remove one branch.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> kernel/irq/timings.c | 66 +++++++++++++++++++++++++++++-----------------------
> 1 file changed, 37 insertions(+), 29 deletions(-)
>
> diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c
> index 1e4cb63a5c82..5d22fd5facd5 100644
> --- a/kernel/irq/timings.c
> +++ b/kernel/irq/timings.c
> @@ -28,6 +28,13 @@ struct irqt_stat {
> int valid;
> };
>
> +/*
> + * The rule of thumb in statistics for the normal distribution
> + * is having at least 30 samples in order to have the model to
> + * apply.
> + */
> +#define SAMPLE_THRESHOLD 30
> +
> static DEFINE_IDR(irqt_stats);
>
> void irq_timings_enable(void)
> @@ -101,7 +108,6 @@ void irq_timings_disable(void)
> * distribution appears when the number of samples is 30 (it is the
> * rule of thumb in statistics, cf. "30 samples" on Internet). When
> * there are three consecutive anomalies, the statistics are resetted.
> - *
> */
> static void irqs_update(struct irqt_stat *irqs, u64 ts)
> {
> @@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
> */
> diff = interval - irqs->avg;
>
> + /*
> + * Online average algorithm:
> + *
> + * new_average = average + ((value - average) / count)
> + *
> + * The variance computation depends on the new average
> + * to be computed here first.
> + *
> + */
> + irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
> +
> + /*
> + * Online variance algorithm:
> + *
> + * new_variance = variance + (value - average) x (value - new_average)
> + *
> + * Warning: irqs->avg is updated with the line above, hence
> + * 'interval - irqs->avg' is no longer equal to 'diff'
> + */
> + irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
> +
> /*
> * Increment the number of samples.
> */
> irqs->nr_samples++;
>
> + /*
> + * If we're still training the model, we can't make any predictions yet.
> + */
> + if (irqs->nr_samples < SAMPLE_THRESHOLD)
> + return;
> +
> /*
> * Online variance divided by the number of elements if there
> * is more than one sample. Normally the formula is division
> @@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
> * more than 32 and dividing by 32 instead of 31 is enough
> * precise.
> */
> - if (likely(irqs->nr_samples > 1))
> - variance = irqs->variance >> IRQ_TIMINGS_SHIFT;
> + variance = irqs->variance >> IRQ_TIMINGS_SHIFT;
>
> /*
> - * The rule of thumb in statistics for the normal distribution
> - * is having at least 30 samples in order to have the model to
> - * apply. Values outside the interval are considered as an
> - * anomaly.
> + * Values outside the interval are considered as an anomaly.
> */
> - if ((irqs->nr_samples >= 30) && ((diff * diff) > (9 * variance))) {
> + if ((diff * diff) > (9 * variance)) {
> /*
> * After three consecutive anomalies, we reset the
> * stats as it is no longer stable enough.
> @@ -191,27 +220,6 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
> */
> irqs->valid = 1;
>
> - /*
> - * Online average algorithm:
> - *
> - * new_average = average + ((value - average) / count)
> - *
> - * The variance computation depends on the new average
> - * to be computed here first.
> - *
> - */
> - irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
> -
> - /*
> - * Online variance algorithm:
> - *
> - * new_variance = variance + (value - average) x (value - new_average)
> - *
> - * Warning: irqs->avg is updated with the line above, hence
> - * 'interval - irqs->avg' is no longer equal to 'diff'
> - */
> - irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
> -
> /*
> * Update the next event
> */
>


--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog