Re: Some questions about linux kernel.

From: Jesse Pollard (pollard@cats-chateau.net)
Date: Sun Mar 19 2000 - 14:00:03 EST


On Sat, 18 Mar 2000, Horst von Brand wrote:
>Jesse Pollard <pollard@cats-chateau.net> said:
>
>[...]
>
>> It makes it better. The specific process causing the problem is now
>> identified.
>
>The only thing you identify this way is the first unsuspecting victim which
>happens to step into your trap. The process itself may very well be
>innocent, it just finds itself in a position where making a legitimate
>request at the wrong moment gets it shot. It depends on all the other stuff
>the system is doing at the time: Running NS on an idle, 32Mb machine will
>work; if it is also compiling a kernel and rebuilding XFree86 ATM,
>something will have to give. Who is at fault? NS isn't, it is doing its
>job. Neither are the make, sh, or gcc and whatnot processes doing anything
>illegal. Note that all this is completely unrelated to user quotas and
>such: User quotas just shift the problem to the individual users, some of
>whose innocent processe get killed because of the other things they are
>doing. It can be argued that the user is being held responsible this way,
>and this is in a sense true. But the process in itself is doing nothing
>wrong.

Except that the user IS exceeding what was made available via quotas.

>This is one of the reasons why OOM handling is hard: Very often there _is_
>no single culprit that can be blamed and clearly deserves to be shot, there
>is just a bunch of innocent people some of whom you have to kill or else
>all are dead.

This is why resource allocation and quotas are important in multiuser systems.

>You also try to protect the system against malicious (or runaway)
>processes. To protect against a runaway memory leak (or dumb mallocbomb)
>is rather easy, essentially you shoot the largest/fastest growing process.
>Protection against malicious processes is much, much harder, as they _will_
>take advantage of blind spots in the detection scheme.

Unfortunately, that may not be the right one. The "mallocbomb" may not be
growing very fast - but the web server might be as compaired to the bomb.
Or it might be telnetd/login starting a new session. Or cron starting
a new job. Or inetd starting a new ftpd. Or init trying to restart a
getty on the console which just logged out...

>Then there is the fact that this is (or should be, at least for a
>well-speced machine) a very rare event, so very little (if any) effort
>during normal operation is warranted. This includes the OS and machine, and
>even much more importantly the sysadmin.

It cannot be spec'ed without having something to measure against. This is
the other place resource quotas come in.

This can be compaired to disk quotas. Nobody will really fault the use
of disk quotas.
1. How many file systems are overcommited?
2. Whose decision was that?
3. Who detects the failure?
4. Who gets the blame when it occurs?

If filesystems are not overcommited, the the total of all legal users
will equal the size of available data space (meta-data included as
"data" here). An overcommited file system is one where the total of
all users quota exceeds the size of the filesystem.

Filesystem failure doesn't occur until enough users use up the space.
No single user may have even reached their limit. But random write
failures occur, databases may become corrupt, and a lot of blame
is going to be assigned.

Ansers with quota controls:
  #2: Overcommiting is a management choice.
  #3: the first user that fills the filesystem without using up their quota
  #4: Management.

Without quotas being available:
   #2: the analyst that selected the system
   #3: the first user that fills the filesystem
   #4: the operating system

Do I use quotas? YES.
Do I have overcommit? YES - management said to do so.
Does the system or analyst get blamed for failure? NO.

I need the same controls on Linux servers that I have on other servers.

I can be a lot more lax on restrictions where the system is dedicated
to a single user than I can where it is a multiuser server.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@cats-chateau.net

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:27 EST