Re: rtla osnoise hist: average duration is always zero

From: Daniel Bristot de Oliveira
Date: Wed Dec 28 2022 - 10:25:33 EST


Hi Andreas,

On 12/24/22 13:39, Andreas Ziegler wrote:
> -- Observed in, but not limited to, Linux 6.1.1

Since original commit... The best way to report this is adding
a Fixes: tag. For example:

Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode")

> rtla osnoise hist always outputs '0' as average duration value. Example:
>
> # rtla osnoise hist -P F:1 -c 0-1 -r 900000 -d 1M -b 1 -E 5000 -T 1
> # RTLA osnoise histogram
> # Time unit is microseconds (us)
> # Duration:   0 00:01:00
>   ...
> count:     5629      1364
> min:          1         1
> avg:          0         0
> max:       2955        56
>
> This is due to sum_sample in osnoise_hist_update_multiple() being calculated as the sum (duration), not as sum (duration * count).

Yeah, that is correct. It works on timerlat hist because timerlat hist collects
each trace event. osnoise hist uses in-kernel histograms, so we need to multiply
the value with the count. This is a leftover from the development phase, as I started
using tracing and then moved to histograms (which is better).

> Rounding, instead of truncating, of the average value would be cool.

I thought: the values were already rounded up, so it might be rounding too much.

But we are in user space. It is just easier to add a two digits precision
to the value, no?

> The following patch would solve the issue described above:
>
>
> Sampled duration must be weighted by observed quantity, to arrive at a
> correct average duration value.
>
> Fix calculation of total duration by summing (duration * count).
> Introduce rounding for calculation of final value.
>
> Signed-off-by: Andreas Ziegler <br015@xxxxxxxxxx>
>
> --- a/tools/tracing/rtla/src/osnoise_hist.c
> +++ b/tools/tracing/rtla/src/osnoise_hist.c
> @@ -121,6 +121,7 @@
>  {
>      struct osnoise_hist_params *params = tool->params;
>      struct osnoise_hist_data *data = tool->data;
> +    unsigned long long total_duration;
>      int entries = data->entries;
>      int bucket;
>      int *hist;
> @@ -131,10 +132,12 @@
>      if (data->bucket_size)
>          bucket = duration / data->bucket_size;
>
> +    total_duration = duration * count;
> +
>      hist = data->hist[cpu].samples;
>      data->hist[cpu].count += count;
>      update_min(&data->hist[cpu].min_sample, &duration);
> -    update_sum(&data->hist[cpu].sum_sample, &duration);
> +    update_sum(&data->hist[cpu].sum_sample, &total_duration);

How about re-seding a patch with the code above, adding the:

Fixes: 829a6c0b5698 ("rtla/osnoise: Add the hist mode")

and...

>      update_max(&data->hist[cpu].max_sample, &duration);
>
>      if (bucket < entries)
> @@ -333,7 +336,7 @@
>
>          if (data->hist[cpu].count)
>              trace_seq_printf(trace->seq, "%9llu ",
> -                    data->hist[cpu].sum_sample / data->hist[cpu].count);
> +                (data->hist[cpu].sum_sample + data->hist[cpu].count / 2) / data->hist[cpu].count);

another patch with this part, adding two digits precision?

>          else
>              trace_seq_printf(trace->seq, "        - ");
>      }
>
Thanks!
-- Daniel

> Kind regards,
> Andreas