Re: [Lse-tech] Re: [PATCH] new CSA patchset for 2.6.8

From: Tim Schmielau
Date: Mon Aug 30 2004 - 03:31:55 EST


On Fri, 27 Aug 2004, John Hesterberg wrote:

> On Fri, Aug 27, 2004 at 07:42:18AM +0200, Guillaume Thouvenin wrote:
> > On Thu, Aug 26, 2004 at 10:05:37PM +0200, Tim Schmielau wrote:
> >
> > > With the new BSD acct v3 format, it should be possible to do per job
> > > accounting entirely from userspace, using pid and ppid information to
> > > reconstruct the process tree and some userland database for the
> > > pid -> job mapping. It would, however, be greatly simplified if the
> > > accounting records provided some kind of job id, and some indicator
> > > whether or not this process was the last of a job (group).
> >
> > I like this solution.
> > In fact what I proposed was to have PAGG and a modified BSD accounting
> > that can be used with PAGG as both are already in the -mm tree. But
> > manage group of processes from userspace is, IMHO, a better solution as
> > modifications in the kernel will be minimal.
>
> The kernel part of linux-job is a module that uses PAGG, and
> isn't difficult. We've been running it in production for a
> couple years.

Well, I'm rethinking my opinion of not wanting two accounting methods in
the kernel. Make them share as much code as possible, with the only
remaining difference being the format of the record and wether it is
written per process or per job. Then we just have to make sure that both
mechanisms get exercised regulary, to prevent bit-rot.

> I don't think a kernel-based job is a requirement, though,
> so I'd like to hear more about how you'd do it otherwise.
>
> The other comments about only one acct record per job vs one
> per process might be important, and that might mean the kernel
> has to know about the job.

Yes, it would probably be easier if the kernel knows about the job and
could stuff a job ID into the acct record. If that means going from 64
byte records to 128 bytes, this would again double the already larger
overhead of BSD accounting, however. This lightweightness of CSA is why I
am not opposed to its inclusion.

On the other hand, there are a few uses (and users) of per-process
accounting records, i.e. for security auditing, so we should not back
it out of the kernel.

> How does the BSD accounting define jobs?
> What determines the job that a process is part of?

BSD accounting doesn't have the concept of a job at all. When we discussed
the v3 format, we considered adding a job ID field from PAGG, but a)
nobody answered and b) there wasn't any space left in the record anyways.
So a decision was postponed for a future 128 byte acct v4 structure.

> An important aspect of linux-job (ie the job part of the pagg/job/csa
> stack) is that it is inescapable. The user doesn't get to determine or
> change their job (unlike process groups). For true accounting, that
> determines the real $$$ chargebacks on shared machines, this is
> necessary.

My proposed solution for a userspace method would also be inescapable:
To start a new job, just write out it's pid to a file that is only ever
appended to, and consider all children of it as belonging to the same job.

Inescapeable, but probably some overhead in userspace.

Oh, wait - I think there is a problem in current BSD accounting, if the
parent process dies before the child, and the child gets reparented to
init. I should investigate that...

> Another aspect of jobs that isn't directly related to accounting
> is that it gives users and admins a way to query, and kill :-),
> all the processes that are part of the job. The inescapable part
> is again important...you can't fork off a process and detach it from
> the job to hide it. In fact, I've heard that some sites use pagg/job
> without CSA for this reason. It might have been an ISP or ASP, and
> they liked the containment linux-job provided.

Yes, it's probably a lot easier if you don't have to search accounting
files to do that.

So from my view, we might turn the discussion from whether we want CSA to
how we integrate it, i.e. do some code review.

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/