On Mon, 2008-08-25 at 12:49 +0300, TÃrÃk Edwin wrote:
On 2008-08-25 12:23, Peter Zijlstra wrote:
On Mon, 2008-08-25 at 10:04 +0300, edwin wrote:Are you referring to the mmap_sem lock, or my mutex lock around all_thread_time?
Peter Zijlstra wrote:No, I think I know what's going on..
On Mon, 2008-08-25 at 00:01 +0300, TÃrÃk Edwin wrote:Sorry, I forgot to include the .config, its at the end of this mail (the cfs debug info output included the .config though)
Hi Ingo,Can you share your .config, and prehaps tell what kernel version did
When I run clamd (www.clamav.net), I can only get to load my CPU 50% (according to top), and disks at 30% (according to iostat -x 3), regardless how many threads I set (I tried 4, 8, 16, 32).
work for you?
Well, I just bought this new box, so there isn't a kernel version that I know that worked on this hardware (but I am trying to boot some older versions now).
However on my previous box (Athlon64, non-SMP) I have never seen such a problem (that the CPU is loaded only 50% with clamd) and I've been
running 2.6.26 and 2.6.27-rc4 there too.
Details below, short summary here:
2.6.24: WORKS, clamd 400% CPU, testprogram runs in 27.4 seconds, 67% CPU load; and 28.5 seconds w/o setting affinity
2.6.25+: DOES NOT WORK, clamd 200%-300% CPU, testprogram runs in 38-40 seconds, 48-48% CPU load, and 47-56 seconds w/o setting affinity
Debian has 2.6.18, 2.6.22, 2.6.24, 2.6.25, 2.6.26.
2.6.22 won't work with my lvm, so I can't boot that, so I tried 2.6.24:
2.6.24 doesn't have sched_debug enabled in the stock kernel unfortunately, but the output of cfs-debug-info.sh is available here, maybe it contains some useful info:
http://edwintorok.googlepages.com/testrun-1219645937.tar.gz
Is this enough info for you to reproduce the problem, or do you want me to try and bisect?
mmap() and munmap() need to take the mmap_sem for writing (since they
modify the memory map) and you let each thread (one for each cpu) take
that process wide lock, twice, for a million times.
mmap_sem, its process wide, and your test prog bangs on it like there's
no tomorrow.
Guess what happens ;-)So the problem is that doing mmap() doesn't scale well with multiple threads, because there is contention on mmap_sem?
Indeed.
Why did 2.6.24 seem to work better?
Perhaps the scheduler overhead did increase, can you try:
echo NO_HRTICK > /debug/sched_features
(after mounting debugfs on /debug, or adjusting the path to where you do
have it mounted)
That might cause some overhead on very high context switch rates.