Re: [patch 00/16] CFS Bandwidth Control v7

From: Ingo Molnar
Date: Fri Jul 01 2011 - 08:28:55 EST



* Hu Tao <hutao@xxxxxxxxxxxxxx> wrote:

> > Yeah, these numbers look pretty good. Note that the percentages
> > in the third column (the amount of time that particular event was
> > measured) is pretty low, and it would be nice to eliminate it:
> > i.e. now that we know the ballpark figures do very precise
> > measurements that do not over-commit the PMU.
> >
> > One such measurement would be:
> >
> > -e cycles -e instructions -e branches
> >
> > This should also bring the stddev percentages down i think, to
> > below 0.1%.
> >
> > Another measurement would be to test not just the feature-enabled
> > but also the feature-disabled cost - so that we document the
> > rough overhead that users of this new scheduler feature should
> > expect.
> >
> > Organizing it into neat before/after numbers and percentages,
> > comparing it with noise (stddev) [i.e. determining that the
> > effect we measure is above noise] and putting it all into the
> > changelog would be the other goal of these measurements.
>
> Hi Ingo,
>
> I've tested pipe-test-100k in the following cases: base(no patch),
> with patch but feature-disabled, with patch and several
> periods(quota set to be a large value to avoid processes
> throttled), the result is:
>
>
> cycles instructions branches
> -------------------------------------------------------------------------------------------------------------------
> base 7,526,317,497 8,666,579,347 1,771,078,445
> +patch, cgroup not enabled 7,610,354,447 (1.12%) 8,569,448,982 (-1.12%) 1,751,675,193 (-0.11%)
> +patch, 10000000000/1000(quota/period) 7,856,873,327 (4.39%) 8,822,227,540 (1.80%) 1,801,766,182 (1.73%)
> +patch, 10000000000/10000(quota/period) 7,797,711,600 (3.61%) 8,754,747,746 (1.02%) 1,788,316,969 (0.97%)
> +patch, 10000000000/100000(quota/period) 7,777,784,384 (3.34%) 8,744,979,688 (0.90%) 1,786,319,566 (0.86%)
> +patch, 10000000000/1000000(quota/period) 7,802,382,802 (3.67%) 8,755,638,235 (1.03%) 1,788,601,070 (0.99%)
> -------------------------------------------------------------------------------------------------------------------

ok, i had a quick look at the stddev numbers as well and most seem
below the 0.1 range, well below the effects you managed to measure.
So i think this table is pretty accurate and we can rely on it for
analysis.

So we've got a +1.1% incrase in overhead with cgroups disabled, while
the instruction count went down by 1.1%. Is this expected? If you
profile stalled cycles and use perf diff between base and patched
kernels, does it show you some new hotspot that causes the overhead?

To better understand the reasons behind that result, could you try to
see whether the cycles count is stable across reboots as well, or
does it vary beyond the ~1% value that you measure?

One thing that can help validating the measurements is to do:

echo 1 > /proc/sys/vm/drop_caches

Before testing. This helps re-establish the whole pagecache layout
(which gives a lot of the across-boot variability of such
measurements).

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/