Re: How to increat [sic.] max open files?

Dean Gaudet (dgaudet-list-linux-kernel@arctic.org)
Mon, 6 Jan 1997 19:14:39 -0800 (PST)


Given that I've been involved with the Apache group for a while now I can
comment on part of this:

On Fri, 3 Jan 1997, James L. McGill wrote:
> > I think that a task, process, program, etc., that needs more than 100
> > file handles is improperly written.
>
> I agree with that. But Apache Httpd is a pretty well written program.
> It's widely accepted as the best solution for a large scale web server.
> We have never had a problem that can be attributed to a problem with
> Apache. I do not intend to take on the task of rewriting it, but I have
> notified the developers of this problem.

The first statement (sorry, lost the attribution) is patently false.
There are security and performance reasons that apache must open the log
files once, and then keep them open for the entire duration of its run.
If the configuration demands that each vhost has its own log (say, at an
ISP that is providing virtual hosting) then it will need as many fds per
vhosts as there are logs open (only two are really required -- an error
log, and a configurable hit log).

The performance reasons I'm sure everyone can figure out. The security
reasons? They're even more important. Apache opens the log files while
it is still running as root, and afterwards gives up its privs. By doing
this it can have log files that *are not writeable by the httpd userid*.
If it did have log files writeable by the httpd userid, and it allowed
users to write their own CGIs... then it would be trivial for users to
overwrite the logs. Now consider that many sites use the byte counts in
the access logs to determine how much to charge the user, and consider
that users probably want an idea of the hits going to their site.

Now, again, is apache improperly written? No. In fact we've considered
putting in *gross hacks* to work around these and similar limitations. At
least linux is pretty nice in this respect -- if you rebuild your entire
system you can be safe with a higher limit. But Solaris, for example, has
a limit of 256 fds that will work with FILE *s, beyond that FILE * breaks
badly. Apache still uses FILE *, and is loathe to tell people they should
install a better stdio library. Sun can't fix this easily without
breaking binary compatibility (although there is a hack they can do,
similar to what SGI did on IRIX).

Sure I can give you an example Apache configuration that requires two fds
for logging total. But many sites don't want to keep all the logs in one
huge log -- because their users want to be able to "tail -f" their private
log. Not to mention the annoyance of debugging a CGI when you're sharing
an error_log with a hundred other sites.

What aboud muds and ircs? They have legit reasons for needing thousands
of sockets open at the same time.

The only thing I'm arguing against here is the "I can't see a reason for
it, therefore there is no reason for it" attitude... I personally haven't
had a problem building my system for greater than 256 handles, although I
would have appreciated it being easier.

> Agreed. Ther reason that an HTTPD server must run as one big process
> is because it must bind all those addresses to one port: 80.

If you are using ip-based vhosts then you can use BindAddress to bind to
specific IPs, at the cost of 1 fd per vhost. This would allow you to run
multiple httpd parents on the same machine.

> > Each time a daemon closes its last file, it expires. Now, you have 100
> > daemons when you need them and 1 daemon when you only need it.

There is a degenerate case. Consider 100 daemons open with 1 file each.
That's a bit inefficient.

Dean