Bad apache perfomance wtih linux SMP

Juergen Schmidt (ju@ct.heise.de)
Mon, 17 May 1999 12:54:43 +0200


Hello,

first of all, please excuse this off-topic posting. I'm doing a test
with apache on a 4CPU-Box (Siemens Primergy 870) with Linux/apache and
NT/IIS. Now I got some *very* strange results and want to make sure,
that you -- the developers -- have the occasion to comment on them,
*before* I publish them.
If there's a better way to do this, please feel free to tell me.

So now to the facts:

I've got a Siemens Primergy 870 with:

4 CPUs PII Xeon 450 MHz
2 GByte RAM
Mylex RAID Controler DAC 960 (64MB RAM), RAID5
2 x Intel EEpro 100 (only one is used, the other is not ifconfig'ed but
detected)

Linux 2.2.8 (SuSE 6.1., glibc based)
Apache 1.3.6
(NT 4.0 SP4, IIS 4.0)

Network: switched 100 MBit/s, half duplex

Neither system is intensivly tuned -- apart from the obvious stuff, like
disabling not needed modules, turning off hostname lookups, ...

I'm measuring plain HTTP-GET on a static html-file with 8
(Linux-)clients each running up to 64 processes (for a total of 512)
doing HTTP-GET-requests in a tight loop. All files come from a partition
on the RAID-array (RAID 5) which is used for logging, too. File size is
4KByte (I measured 1k and 8k with similar results).

Results:

-- with one CPU both NT/IIS and Linux/apache deliver about the same
perfomance (+- 10%). Even Linux-SMP and Non-SMP only differ by < 10%.

-- with 4 CPUs NT/IIS gives sligthly more Reqeusts/sec (< 10%): that's
mainly because my setup doesn't support real heavy load by doing plain
HTTP-GETs. One CPU obviously is sufficient for this task here (as in
most real life setups with plain HTML serving)

-- Linux with 4 CPUs is disastrous: It delivers significantly less RPS
than the single-CPU version -- about a factor of 4 ! Only at high loads
(256 procs und up) it catches up.

One other strange thing (but I still have to doublecheck this):

- Linux 4 CPUs: I get slightly better perfomance if the processes fetch
a random page (out of 10000 files, so that they still fit in the buffer
cache). All the other combinations (Linux 1 CPU, NT) are a bit slower in
comparison to fetching only one page.

For clarity I included a picture (it says more than a lot of words.

The red line is the perfomance of linux with 4 CPUs, blue with random
files, green Linux one CPU, pink is NT 1 CPU (4 CPUS are slightly above
the linux curve). The drop at 512 processes is due to connect()-failures
-- I'm currently investigating this too.

Do you have any ideas, what's happening there?
Or even better, how to fix this?

It seems to me, that this might exactly be the factor 4 those mindcraft
people have measured. Do you think, this is possible?

bye, juergen

BTW: In another test with cgi-scripts I set NR_OPEN in the linux kernel
from 1024 to 2048 (in include/linux/limits.h, fs.h and posix_types.h)
and recompiled apache.
So I got rid of the open() errors that occured under heavy load. But
therefore I get only about half the rps with <=128 processes. Did I
forget something?

PS: Please CC me per mail, as I have not subscribed to the list.

-- 
Juergen Schmidt   Redakteur/editor  c't magazin      PGP-Key available
Verlag Heinz Heise GmbH & Co KG, Helstorferstr. 7, D-30625 Hannover
EMail: ju@ct.heise.de - Tel.: +49 511 5352 300 - FAX: +49 511 5352 417

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/