Re: [tip:sched/core] sched: cpuacct: Use bigger percpu counterbatch values for stats counters

From: Bharata B Rao
Date: Thu Jul 16 2009 - 04:12:02 EST


On Tue, May 12, 2009 at 07:44:37PM +0900, KOSAKI Motohiro wrote:
> ------------------------------------
> Subject: [PATCH] sched: cpuacct: Use bigger percpu counter batch values for stats counters
>
> percpu counters used to accumulate statistics in cpuacct controller use
> the default batch value [max(2*nr_cpus, 32)] which can be too small for
> archs that define VIRT_CPU_ACCOUNTING. In such archs, a tick could result in
> cputime updates in the range of thousands. As a result, cpuacct_update_stats()
> would end up acquiring the percpu counter spinlock on every tick which
> is not good for performance.
>
> Let those architectures to have a bigger batch threshold so that percpu counter
> spinlock isn't taken on every tick. This change doesn't affect the archs which
> don't define VIRT_CPU_ACCOUNTING and they continue to have the default
> percpu counter batch value.
>
> v7:
> - fix typo and changelog
>
> v6:
> - fix build error when UP
>
> v5:
> - move cpuacct_batch initialization into sched_init()
>
> v4:
> - rewrite patch description (thanks Bharata!)
> - append read_mostly to cpuacct_batch
> - cpuacct_batch is initialized by sched_init_debug()
>
> v3:
> - revert using percpu_counter_sum()
>
> v2:
> - use percpu_counter_sum() instead percpu_counter_read()
>
> Cc: Balaji Rao <balajirrao@xxxxxxxxx>
> Cc: Dhaval Giani <dhaval@xxxxxxxxxxxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxx>
> Cc: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
> Signed-off-by: Bharata B Rao <bharata@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
> ---
> kernel/sched.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Index: b/kernel/sched.c
> ===================================================================
> --- a/kernel/sched.c 2009-05-12 13:12:59.000000000 +0900
> +++ b/kernel/sched.c 2009-05-12 19:04:49.000000000 +0900
> @@ -870,6 +870,8 @@ static __read_mostly int scheduler_runni
> */
> int sysctl_sched_rt_runtime = 950000;
>
> +static __read_mostly s32 cpuacct_batch;
> +
> static inline u64 global_rt_period(void)
> {
> return (u64)sysctl_sched_rt_period * NSEC_PER_USEC;
> @@ -9284,6 +9286,10 @@ void __init sched_init(void)
>
> perf_counter_init();
>
> +#ifdef CONFIG_SMP
> + cpuacct_batch = jiffies_to_cputime(percpu_counter_batch);
> +#endif

On ppc64, calling jiffies_to_cputime() from sched_init() is too early because
jiffies_to_cputime() needs tb_ticks_per_sec which gets initialized only
later in time_init(). Because of this I see that cpuacct_batch will always
be zero effectively negating what this patch is trying to do.

As explained by you earlier, we too are finding the default batch value to
be too low for ppc64 with VIRT_CPU_ACCOUNTING turned on. Hence I guess
if this patch is taken in (ofcourse with the above issue fixed), it will
benefit ppc64 also.

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/