Re: Very high load on P4 machines with 2.4.28

From: Marek Habersack
Date: Wed Jan 05 2005 - 06:36:39 EST


On Wed, Jan 05, 2005 at 06:28:41AM +0100, Willy Tarreau scribbled:
> Hi,
Hello,

> On Wed, Jan 05, 2005 at 12:07:33AM +0100, Marek Habersack wrote:
> > Interestingly enough, the machine with the highest load average is the
> > one generating 4Mbit/s and the one with 24Mbit/s has the smallest load
> > average value.
>
> This is common with multi-process servers like apache if the link is
> saturated, because data takes more time to reach the client, so you have
> a higher concurrency.
The link isn't saturated - we have a 200Mbit/s margin atm. It's not a
bandwidth problem, that's certain.

> > The latter also suffers from the biggest loadavg increase.
> > All of the virtual machines have iptables accounting chains for each
> > configured IP (there are between 62 IP numbers on one and 32 on the other).
> > The virtual boxes have two 80GB SATA drives raided with softraid. The
> > non-virtual box has a single IDE drive, no raid.
>
> > (virtual #2, the 24Mbit/s one)
> > # vmstat
> > procs -----------memory---------- ---swap-- -----io---- --system------cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 5 3 172448 13084 1208 304048 4 4 90 50 109 117 19 8 73 0
>
> I don't like something : with 73% idle, you have 5 processes in the rq. I think
> this machine writes logs synchronously to disks, or stores SSL sessions on a
the only synchronously written logs are auth.log and mail.err, SSL is there
indeed, but the site is hardly ever accessed (as of a while ago, the box has
a load of 0.75, pushing out 14Mbit/s. With 2.4.28 last week it was around
10.0 in the same conditions).

> real disk and waits for writes. A tmpfs would be a great help.
The only thigs writing to disk on regular basis (except for syslog and
apache for logs) are the php session files, one tdb database for traffic
data and mysql (which might be using fsync - can that be the cause of the
i/o slowness?). But, in any case, the machine behaves well under kernels
other than 2.4.28.

> You can try to trace the processes activity with :
>
> # strace -Te write <process pid>
> It will display the time elapsed in each write() syscall, you'll find the
> fds in /proc/<pid>/fd. You may notice big times on logs or ssl sessions.
nope... the times are in the range 0.000008 to 0.000045...

> > (the non-virtual)
> > # vmstat
> > procs -----------memory---------- ---swap-- -----io---- --system------cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 60 0 70300 115960 0 369244 0 0 79 32 90 45 73 7 21 0
>
> Same note for this one, although it does more user space work (php? ssl?).
poorly written perl scripts

> It's possible that some change in 2.4.28 touches the I/O subsystem and
> increases your wait I/O time in this particular application.
> (...)
Any clues as to where too look? I examined the 2.4.28 changelog and saw
nothing that would suggest such change, but then I'm not a kernel hacker, I
might have easily missed something important.

> > One other interesting thing to note is that we have one
> > other box with the similar configuration to the virtuals (also a virtual
> > host) but it runs 2.4.28 with SMP+HT enabled - no load problems there at
> > all.
>
> So, to contradict myself, have you tried enabling HT on other boxes which
> suffer from the load ?
Yep, only one box boots fine with HT enabled (out of the ones with
problems), the others just freeze (we thought it could have been the machine
BIOS, but updating it didn't help)

> > Let me know if you need more info,
>
> You have send fairly enough info right now. Other than I/O work, I have no
> idea. You may want to play with /proc/sys/vm/{bdflush,max-readahead} and
> others to see if it changes things.
At this point I think we're gonna run them under the older kernels and wait
for 2.4.29 to see whether the problem still exists there. If it does, we'll
try 2.6 on the machines and if that doesn't help, we'll do some more testing
with 2.4.28 - we have our hands tied, since they are production machines and
we cannot let them run with such degraded performance for too long...

> If your load is bursty, it might help to reduce the ratio of dirty blocks
> before flushing (first field in bdflush), because although writes will
> start more often, they will take fewer time.
what about nfract_sync? Does it make sense to make it smaller as well? I've
also decreased age_buffer to 15s

> I already have solved similar problems by disabling keep-alive to decrease
> the number of processes.
Disabling keep-alive is a routine here... :) But, that is unlikely to be the
cause since it's evidently a kernel thing.

Well, I'll see what good the bdflush changes do to the machines when they
run under the "good" kernel and we'll schedule for some testing with 2.4.28
at some point.

thanks for your help, it's greately appreciated!

best regards,

marek

Attachment: signature.asc
Description: Digital signature