Re: vmstat: use our own timer events

From: Andrew Morton
Date: Sun Apr 29 2007 - 04:16:09 EST


On Sat, 28 Apr 2007 22:09:04 -0700 (PDT) Christoph Lameter <clameter@xxxxxxx> wrote:

> vmstat is currently using the cache reaper to periodically bring the
> statistics up to date. The cache reaper does only exists in SLUB
> as a way to provide compatibility with SLAB. This patch removes
> the vmstat calls from the slab allocators and provides its own
> handling.
>
> The advantage is also that we can use a different frequency for the
> updates. Refreshing vm stats is a pretty fast job so we can run this
> every second and stagger this by only one tick. This will lead to
> some overlap in large systems. F.e a system running at 250 HZ with
> 1024 processors will have 4 vm updates occurring at once.
>
> However, the vm stats update only accesses per node information.
> It is only necessary to stagger the vm statistics updates per
> processor in each node. Vm counter updates occurring on distant
> nodes will not cause cacheline contention.
>
> We could implement an alternate approach that runs the first processor
> on each node at the second and then each of the other processor on a
> node on a subsequent tick. That may be useful to keep a large amount
> of the second free of timer activity. Maybe the timer folks will have
> some feedback on this one?

The one-per-second timer interrupt will upset the people who are really
aggressive about power consumption (eg, OLPC). Perhaps there isn't (yet)
an intersection between those people and SMP.

However a knob to set the frequency would be nice, if it's not too
expensive to implement. Presumably anyone who cares enough will come along
and add one, but then they have to wait for a long period for that change
to propagate out to their users, which is a bit sad for something which we
already knew about.

Having each CPU touch every zone looks a bit expensive - I'd have thought
that it would be showing up a little on your monster NUMA machines?

> @@ -648,11 +664,21 @@ static int __cpuinit vmstat_cpuup_callba
> unsigned long action,
> void *hcpu)
> {
> + long cpu = (long)hcpu;
> +
> switch (action) {
> - case CPU_UP_PREPARE:
> - case CPU_UP_PREPARE_FROZEN:
> - case CPU_UP_CANCELED:
> - case CPU_UP_CANCELED_FROZEN:
> + case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> + start_cpu_timer(cpu);
> + break;
> + case CPU_DOWN_PREPARE:
> + case CPU_DOWN_PREPARE_FROZEN:
> + cancel_rearming_delayed_work(&per_cpu(vmstat_work, cpu));
> + per_cpu(vmstat_work, cpu).work.func = NULL;
> + case CPU_DOWN_FAILED:
> + case CPU_DOWN_FAILED_FROZEN:
> + start_cpu_timer(cpu);
> + break;
> case CPU_DEAD:
> case CPU_DEAD_FROZEN:
> refresh_zone_stat_thresholds();

Oh dear. Some of these new notifier types are added by a patch which is a
few hundred patches later than slub. I can park this patch after that one,
but that introduces a risk that later slub patches will also get
disconnected.

Oh well, we'll see how things go.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/