Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats

From: Paul Jackson
Date: Mon Jul 03 2006 - 00:53:16 EST


Shailabh wrote:
> Sends a separate "registration" message with cpumask to listen to.
> Kernel stores (real) pid and cpumask.

Question:
=========

Ah - good.

So this means that I could configure a system with a fork/exit
intensive, performance critical job on some dedicated CPUs, and be able
to collect taskstat data from tasks exiting on the -other- CPUS, while
avoiding collecting data from this special job, thus avoiding any
taskstat collection performance impact on said job.

If I'm understanding this correctly, excellent.

Caveat:
=======

Passing cpumasks across the kernel-user boundary can be tricky.

Historically, Unix has a long tradition of boloxing up the passing
of variable length data types across the kernel-user boundary.

We've got perhaps a half dozen ways of getting these masks out of the
kernel, and three ways of getting them (or the similar nodemasks) back
into the kernel. The three ways being used in the sched_setaffinity
system call, the mbind and set_mempolicy system calls, and the cpuset
file system.

All three of these ways have their controversial details:
* The kernel cpumask mask size needed for sched_setaffinity calls is
not trivially available to userland.
* The nodemask bit size is off by one in the mbind and set_mempolicy
calls.
* The CPU and Node masks are ascii, not binary, in the cpuset calls.

One option that might make sense for these task stat registrations
would be to:
1) make the kernel/sched.c get_user_cpu_mask() routine generic,
moving it to non-static lib/*.c code, and
2) provide a sensible way for user space to query the size of
the kernel cpumask (and perhaps nodemask while you're at it.)

Currently, the best way I know for user space to query the kernels
cpumask and nodemask size is to examine the length of the ascii
string values labeled "Cpus_allowed:" and "Mems_allowed:" in the file
/proc/self/status. These ascii strings always require exactly nine
ascii chars to express each 32 bits of kernel mask code, if you include
in the count the trailing ',' comma or '\n' newline after each eight
ascii character word.

Probing /proc/self/status fields for these mask sizes is rather
unobvious and indirect, and requires caching the result if you care at
all about performance. Userland code in support of your taskstat
facility might be better served by a more obvious way to size cpumasks.

... unless of course you're inclined to pass cpumasks formatted as
ascii strings, in which case speak up, as I'd be delighted to
throw in my 2 cents on how to do that ;).

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/