Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4

From: Ingo Molnar
Date: Wed Feb 03 2016 - 07:49:36 EST

Next message: James Hogan: "Re: [PATCH 4/5] MIPS: Support R_MIPS_PC16 rel-style reloc"
Previous message: Robert Baldyga: "[PATCH v4 23/43] usb: gadget: f_ecm: conversion to new API"
In reply to: Mel Gorman: "Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4"
Next in thread: Mel Gorman: "Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Feb 03, 2016 at 12:28:49PM +0100, Ingo Molnar wrote:
> >
> > * Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > Changelog since v3
> > > o Force enable stats during profiling and latencytop
> > >
> > > Changelog since V2
> > > o Print stats that are not related to schedstat
> > > o Reintroduce a static inline for update_stats_dequeue
> > >
> > > Changelog since V1
> > > o Introduce schedstat_enabled and address Ingo's feedback
> > > o More schedstat-only paths eliminated, particularly ttwu_stat
> > >
> > > schedstats is very useful during debugging and performance tuning but it
> > > incurs overhead. As such, even though it can be disabled at build time,
> > > it is often enabled as the information is useful. This patch adds a
> > > kernel command-line and sysctl tunable to enable or disable schedstats on
> > > demand. It is disabled by default as someone who knows they need it can
> > > also learn to enable it when necessary.
> > >
> > > The benefits are workload-dependent but when it gets down to it, the
> > > difference will be whether cache misses are incurred updating the shared
> > > stats or not. [...]
> >
> > Hm, which shared stats are those?
>
> Extremely poor phrasing on my part. The stats share a cache line and the impact
> partially depends on whether unrelated stats share a cache line or not during
> updates.

Yes, but the question is, are there true cross-CPU cache-misses? I.e. are there
any 'global' (or per node) counters that we keep touching and which keep
generating cache-misses?

> > I think we should really fix those as well: those shared stats should be
> > percpu collected as well, with no extra cache misses in any scheduler fast
> > path.
>
> I looked into that but converting those stats to per-cpu counters would incur
> sizable memory overhead. There are a *lot* of them and the basic structure for
> the generic percpu-counter is
>
> struct percpu_counter {
> raw_spinlock_t lock;
> s64 count;
> #ifdef CONFIG_HOTPLUG_CPU
> struct list_head list; /* All percpu_counters are on a list */
> #endif
> s32 __percpu *counters;
> };

We don't have to reuse percpu_counter().

> That's not taking the associated runtime overhead such as synchronising them.

Why do we have to synchronize them in the kernel? User-space can recover them on a
percpu basis and add them up if it wishes to. We can update the schedstat utility
to handle the more spread out fields as well.

Thanks,

Ingo

Next message: James Hogan: "Re: [PATCH 4/5] MIPS: Support R_MIPS_PC16 rel-style reloc"
Previous message: Robert Baldyga: "[PATCH v4 23/43] usb: gadget: f_ecm: conversion to new API"
In reply to: Mel Gorman: "Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4"
Next in thread: Mel Gorman: "Re: [PATCH 1/1] sched: Make schedstats a runtime tunable that is disabled by default v4"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]