Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats

From: Jay Lan
Date: Thu Jun 29 2006 - 15:31:42 EST


Andrew Morton wrote:
On Thu, 29 Jun 2006 09:44:08 -0700
Paul Jackson <pj@xxxxxxx> wrote:


You're probably correct on that model. However, it all depends on the actual
workload. Are people who actually have large-CPU (>256) systems actually
running fork()-heavy things like webservers on them, or are they running things
like database servers and computations, which tend to have persistent
processes?

It may well be mostly as you say - the large-CPU systems not running
the fork() heavy jobs.

Sooner or later, someone will want to run a fork()-heavy job on a
large-CPU system. On a 1024 CPU system, it would apparently take
just 14 exits/sec/CPU to hit this bottleneck, if Jay's number of
14000 applied.

Chris Sturdivant's reply is reasonable -- we'll hit it sooner or later,
and deal with it then.



I agree, and I'm viewing this as blocking the taskstats merge. Because if
this _is_ a problem then it's a big one because fixing it will be
intrusive, and might well involve userspace-visible changes.

The only ways I can see of fixing the problem generally are to either

a) throw more CPU(s) at stats collection: allow userspace to register for
"stats generated by CPU N", then run a stats collection daemon on each
CPU or

Clearly this approach (or the per-cpuset as Paul suggested) can solve
large-CPU system issues. As technology advances, this _WILL_ become a
problem sooner or later.

However, taskstats header carries a version number. Would a change like
this too intrusive to add to a later version?

Regards,
- jay



b) make the kernel recognise when it's getting overloaded and switch to
some degraded mode where it stops trying to send all the data to
userspace - just send a summary, or a "we goofed" message or something.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/