Re: High load average on disk I/O on 2.6.17-rc3

From: Jason Schoonover
Date: Sun May 07 2006 - 13:32:32 EST


Hi Andrew,

I see, so it sounds as though the load average is not telling the true load of
the machine. However, it still feels that it's consuming quite a bit of
resources. The machine is almost unresponsive if I were to start two or
three copies at the same time.

I noticed the behavior initially when I installed vmware server: I started one
of the vm's booting and was copying another in the background (about a 12GB
directory). The booting VM would started getting slower and slower and
eventually just hung. It wasn't locked up, just seemed like it was "paused."
When I tried an "ls" in another window, it just hung. I then tried to ssh
into the server to open another window and I couldn't even get an ssh prompt.
I had to eventually Ctrl-C the copy and wait for it to be done before I could
do anything. And the load average had skyrocketed, but the consensus here is
definitely that it's not the true load average of the system.

Possibly should I revert back to an older kernel? 2.6.12 or 2.6.10 maybe? Do
you know when abouts the I/O was changed?

I can certainly help debug this issue if you (or someone else) has the time to
look into it and fix it. Otherwise I will just revert back and hope that it
will get fixed in the future.

Thanks,
Jason


-------Original Message-----
From: Andrew Morton
Sent: Sunday 07 May 2006 09:50
To: Jason Schoonover
Subject: Re: High load average on disk I/O on 2.6.17-rc3

On Fri, 5 May 2006 10:10:19 -0700

Jason Schoonover <jasons@xxxxxxxxxxxxxxx> wrote:
> I'm having some problems on the latest 2.6.17-rc3 kernel and SCSI disk I/O.
> Whenever I copy any large file (over 500GB) the load average starts to
> slowly rise and after about a minute it is up to 7.5 and keeps on rising
> (depending on how long the file takes to copy). When I watch top, the
> processes at the top of the list are cp, pdflush, kjournald and kswapd.

This is probably because the number of pdflush threads slowly grows to its
maximum. This is bogus, and we seem to have broken it sometime in the past
few releases. I need to find a few quality hours to get in there and fix
it, but they're rare :(

It's pretty harmless though. The "load average" thing just means that the
extra pdflush threads are twiddling thumbs waiting on some disk I/O -
they'll later exit and clean themselves up. They won't be consuming
significant resources.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/