Re: filesystem corruption caused by process accounting

Winfried Truemper (truemper@mi.uni-koeln.de)
Sun, 19 May 1996 15:57:07 +0200 (MET DST)


On Sat, 18 May 1996, Juha Virtanen wrote:

jiivee> Yes it is. Execute /sbin/accton without arguments to turn off
jiivee> process accounting. Then free some disk space and turn on process
jiivee> accounting again.

That works fine until the partition the log-file is on has enough
free space.

I played around with the process accounting the second time it occured
and found the script "/etc/init.d/acct" which calls "/usr/sbin/accton"
with no parameters. And yes, I used it yesterday (before reporting
the error) but it didn't help.

jiivee> If debian doesn't provide any periodical cleanup utilities for
jiivee> process accounting, you should write such script and run it
jiivee> periodicallu via cron.

There is a script included in debian and it rotates (instead of just
deleting or cutting) the file. But maybe rotating it once a week is
not sufficient.

jiivee> :> I deleted the file `/var/account/pacct' but the space did not
jiivee> :> free up (around 50MB). `/var' remains "full" (0 bytes free
jiivee> :> even for root).
jiivee>
jiivee> Sure, as file is not deleted from file system until it is
jiivee> closed. Kernel keeps active process accounting file continuously
jiivee> open.

I know it was not a good idea to delete the file before swichting of
process accounting, but I told the whole story for completeness.
Yesterday, I _did_ swichted of PA _before_ deleting the file but it
didn't make any difference.

I know it (switching off before deleting) _works_ when there is
_enough_ space on the partition the log-file is on, but it seems that
it does'nt work when there isn't enough space.

jiivee> :> And even worse, the space freed up by deleting huge files in
jiivee> :> /var/log was consumed by a rate of several dozen kb/s. Bummer!
jiivee>
jiivee> Huh? Then your machine ended thousands of processes per second
jiivee> (entries are written on process termination time and 18 entries
jiivee> consume 1008 bytes disk space). Possibly quite a lot of stuff
jiivee> were written to syslog files at that time?

No, I don't think that there were so many processes that they could
cause such a growth of the log-file. But I cannot tell for sure.

jiivee> I haven't seen that kind of filesystem corruption ever. Few times
jiivee> syslog files have _completely_ filled up /var (where also process
jiivee> accounting file resides), but no filesystem corruption have
jiivee> occurred.

I don't refer to "syslog". It was the process-accounting, which filled
up `/var'.

I do not have the exact knowledge but "syslog" seems a process in
user-space to me, whereas PA is something in kernel-space (?). Maybe
that makes a little difference?

jiivee> :>Two related bugs:
jiivee> :> - the output of "pstree" is totally messed up
jiivee>
jiivee> I suppose this is separate problem.

Maybe but than it is triggered by the error above. To make it
absolutely clear: "pstree" _works fine_ on my system the rest of the
time.

And this problem does not have anything to do with the overflowed
`/var'-partition: I umounted and fsck'ed it ('-f' to force), then
mounted it again and looked at the output of "pstree": still messed
up.

Winfried