Re: Strange performance behavior of 2.4.0-test9

From: Rik van Riel (riel@conectiva.com.br)
Date: Wed Oct 25 2000 - 09:47:13 EST


On Wed, 25 Oct 2000 kumon@flab.fujitsu.co.jp wrote:

> I found very odd performance behavior of 2.4.0-test9 on a large SMP
> server, and I want some clues to investigate it.
>
> The workload is WebBench,
> Machine is:
> 8way PII-Xeon 450MHz 2MB cache with Profusion chip-set,
> 1GB Main memory with no highmem,
> using 4 eepro100 networks.
>
> As shown in the table below, test8 is very preferable in both absolute
> performance and scalability.
>
> WebBench performance results.
> Req/sec(4cpu->8cpu)
> --------------------------
> 2.4.0-test1 2816->3702 (31%up)
> 2.4.0-test8 4006->5287 (63%up)
> 2.4.0-test9 3669->2193 (40%down)
> (4cpu means two CPUs for each bus.)
>
> But the performance of test9 shows following problems:
>
> 1) At the 8 cpu configuration, test9 shows extremely inferior
> performance.
> 2) on test8, 8-cpu configuration shows about 2/3 performance of 4-cpu.
        ^^^^^ test9 ??

> Other experiment shows, as # of cpu increases from 4 to 8, the
> performance gradually decreases.
> CPUs Req/s
> 4cpu 3660
> 5cpu 3070
> 6cpu 2730
> 7cpu 2611
> 8cpu 2210
>
> The reason of the performance degradation is increase of idle-time,
> and increase of context switches (2.2x @4cpu, 3.1x @8cpu).

OUCH ...

> The following table shows one minute average of "vmstat 1".
>
> configuration Req/s | r b intr c-sw user os idle
> ------------------------+--------------------------------------------------
> 2.4.0t1 4cpu 2816 | 60 0 28877 5103 18 82 0
> 8cpu 3702 | 37 0 33495 16387 14 86 0
> 2.4.0t8 4cpu 4066 | 57 0 38743 8674 27 73 0
> 8cpu 5287 | 33 0 40755 30626 21 78 0
> 2.4.0t9 4cpu 3669 | 53 3 35822 18898 25 73 2
> 8cpu 2193 | 5 8 22114 94609 9 52 39
                                                ^^^^^

This is pretty insane and is definately a bug which should
be fixed. I'll search the source for "suspicious" changes
and try to come up with a patch you can test.

(unfortunately I don't have anything bigger than a dual CPU
test machine here, so I won't be able to test it here ;( )

> In the experiments, enough free memory was keeped,
> no swap-in/out happened, almost no block-in/out occured.
> interrups is roughly propotional to the performance.
>
> Performance down in 2.4.0-t9 4cpu is not so prominent but actually some
> processes are blocked running by some reason.
> And as the number of CPUs increases the performance degrades.
> On 8cpu, only 1/10 of the processes are runnable.
>
> I've tried test10-pre5 but nothing improved.
>
> Does anybody tell me what I should do?

It would be great if you could test whatever patches people
have floating around to fix this issue.

One thing which may have changed is the fact that the scheduler
has changed a bit and the avg_slice cache-thrashing heuristics
have been removed from reschedule_idle().

Could you give me a few lines of full vmstat output from
your tests? I'd really like to get this bug fixed...

regards,

Rik

--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/ http://www.surriel.com/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:15 EST