Re: RFC vmstat: On demand vmstat threads

From: Gilad Ben-Yossef
Date: Thu Sep 19 2013 - 12:55:20 EST


On Wed, Sep 18, 2013 at 5:06 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 10 Sep 2013 21:13:34 +0000 Christoph Lameter <cl@xxxxxxxxx> wrote:
>

>> With this patch it is possible then to have periods longer than
>> 2 seconds without any OS event on a "cpu" (hardware thread).
>
> It would be useful (actually essential) to have a description of why
> anyone cares about this. A good and detailed description, please.


Let me have a stab at this:

The existing vmstat_update mechanism depends on a deferrable timer
firing every second
by default which registers a work queue item that runs on the local
CPU, with the result
that we have 1 interrupt and one additional schedulable task on each
CPU aprox. every second.

If your workload indeed causes VM activity or you are running multiple
tasks per CPU, you probably
have bigger issues to deal with.

However, many existing workloads dedicate a CPU for a single CPU bound task.
This is done by high performance computing folks, by high frequency
financial applications folks,
by networking folks (Intel DPDK, EZchip NPS) and with the advent of
systems with more and more
CPUs over time, this will(?) become more and more common to do since
when you have enough CPUs
you care less about efficiently sharing your CPU with other tasks and
more about
efficiently monopolizing a CPU per task.

The difference of having this timer firing and workqueue kernel thread
scheduled per second can be enormous.
An artificial test I made measuring the worst case time to do a simple
"i++" in an endless loop on a bare metal
system and under Linux on an isolated CPU (cpusets or isolcpus - take
your pick) with dynticks and with and
without this patch, have Linux match the bare metal performance (~700
cycles) with this patch and loose by
couple of orders of magnitude (~200k cycles) without it[*] - and all
this for something that just calculates statistics.
For networking applications, for example, this is the difference
between dropping packets or sustaining line rate.

Statistics are important and useful, but if there is a way to not
cause statistics gathering produce
such a huge performance difference would be great. This is what we are
trying to do here.

Does it makes sense?

[*] To be honest it required one more patch, but this one or something
like is needed to get that one working, so...

Thanks,
Gilad





--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@xxxxxxxxxxxxx
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
-- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/