Re: Unusually high system CPU usage with recent kernels

From: Tibor Billes
Date: Tue Aug 27 2013 - 16:05:51 EST


> From: Paul E. McKenney Sent: 08/26/13 06:28 AM
> On Sun, Aug 25, 2013 at 09:50:21PM +0200, Tibor Billes wrote:
> > From: Paul E. McKenney Sent: 08/24/13 11:03 PM
> > > On Sat, Aug 24, 2013 at 09:59:45PM +0200, Tibor Billes wrote:
> > > > From: Paul E. McKenney Sent: 08/22/13 12:09 AM
> > > > > On Wed, Aug 21, 2013 at 11:05:51PM +0200, Tibor Billes wrote:
> > > > > > > From: Paul E. McKenney Sent: 08/21/13 09:12 PM
> > > > > > > On Wed, Aug 21, 2013 at 08:14:46PM +0200, Tibor Billes wrote:
> > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 11:43 PM
> > > > > > > > > On Tue, Aug 20, 2013 at 10:52:26PM +0200, Tibor Billes wrote:
> > > > > > > > > > > From: Paul E. McKenney Sent: 08/20/13 04:53 PM
> > > > > > > > > > > On Tue, Aug 20, 2013 at 08:01:28AM +0200, Tibor Billes wrote:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > I was using the 3.9.7 stable release and tried to upgrade to the 3.10.x series.
> > > > > > > > > > > > The 3.10.x series was showing unusually high (>75%) system CPU usage in some
> > > > > > > > > > > > situations, making things really slow. The latest stable I tried is 3.10.7.
> > > > > > > > > > > > I also tried 3.11-rc5, they both show this behaviour. This behaviour doesn't
> > > > > > > > > > > > show up when the system is idling, only when doing some CPU intensive work,
> > > > > > > > > > > > like compiling with multiple threads. Compiling with only one thread seems not
> > > > > > > > > > > > to trigger this behaviour.
> > > > > > > > > > > >
> > > > > > > > > > > > To be more precise I did a `perf record -a` while compiling a large C++ program
> > > > > > > > > > > > with scons using 4 threads, the result is appended at the end of this email.
> > > > > > > > > > >
> > > > > > > > > > > New one on me! You are running a mainstream system (x86_64), so I am
> > > > > > > > > > > surprised no one else noticed.
> > > > > > > > > > >
> > > > > > > > > > > Could you please send along your .config file?
> > > > > > > > > >
> > > > > > > > > > Here it is
> > > > > > > > >
> > > > > > > > > Interesting. I don't see RCU stuff all that high on the list, but
> > > > > > > > > the items I do see lead me to suspect RCU_FAST_NO_HZ, which has some
> > > > > > > > > relevance to the otherwise inexplicable group of commits you located
> > > > > > > > > with your bisection. Could you please rerun with CONFIG_RCU_FAST_NO_HZ=n?
> > > > > > > > >
> > > > > > > > > If that helps, there are some things I could try.
> > > > > > > >
> > > > > > > > It did help. I didn't notice anything unusual when running with CONFIG_RCU_FAST_NO_HZ=n.
> > > > > > >
> > > > > > > Interesting. Thank you for trying this -- and we at least have a
> > > > > > > short-term workaround for this problem. I will put a patch together
> > > > > > > for further investigation.
> > > > > >
> > > > > > I don't specifically need this config option so I'm fine without it in
> > > > > > the long term, but I guess it's not supposed to behave like that.
> > > > >
> > > > > OK, good, we have a long-term workload for your specific case,
> > > > > even better. ;-)
> > > > >
> > > > > But yes, there are situations where RCU_FAST_NO_HZ needs to work
> > > > > a bit better. I hope you will bear with me with a bit more
> > > > > testing...
> > > > >
> > > > > > > In the meantime, could you please tell me how you were measuring
> > > > > > > performance for your kernel builds? Wall-clock time required to complete
> > > > > > > one build? Number of builds completed per unit time? Something else?
> > > > > >
> > > > > > Actually, I wasn't all this sophisticated. I have a system monitor
> > > > > > applet on my top panel (using MATE, Linux Mint), four little graphs,
> > > > > > one of which shows CPU usage. Different colors indicate different kind
> > > > > > of CPU usage. Blue shows user space usage, red shows system usage, and
> > > > > > two more for nice and iowait. During a normal compile it's almost
> > > > > > completely filled with blue user space CPU usage, only the top few
> > > > > > pixels show some iowait and system usage. With CONFIG_RCU_FAST_NO_HZ
> > > > > > set, about 3/4 of the graph was red system CPU usage, the rest was
> > > > > > blue. So I just looked for a pile of red on my graphs when I tested
> > > > > > different kernel builds. But also compile speed was horrible I couldn't
> > > > > > wait for the build to finish. Even the UI got unresponsive.
> > > > >
> > > > > We have been having problems with CPU accounting, but this one looks
> > > > > quite real.
> > > > >
> > > > > > Now I did some measuring. In the normal case a compile finished in 36
> > > > > > seconds, compiled 315 object files. Here are some output lines from
> > > > > > dstat -tasm --vm during compile:
> > > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > > > >   time   |usr sys idl wai hiq siq| read Âwrit| recv Âsend| Âin  out | int  csw | used Âfree| used Âbuff Âcach Âfree|majpf minpf alloc Âfree
> > > > > > 21-08 21:48:05| 91  8  2  0  0  0|  0 Â5852k|  0   0 |  0   0 |1413 Â1772 |  0 Â7934M| 581M 58.0M Â602M 6553M|  0  Â71k  46k  54k
> > > > > > 21-08 21:48:06| 93  6  1  0  0  0|  0 Â2064k| 137B Â131B|  0   0 |1356 Â1650 |  0 Â7934M| 649M 58.0M Â604M 6483M|  0  Â72k  47k  28k
> > > > > > 21-08 21:48:07| 86 Â11  4  0  0  0|  0 Â5872k|  0   0 |  0   0 |2000 Â2991 |  0 Â7934M| 577M 58.0M Â627M 6531M|  0  Â99k  67k  79k
> > > > > > 21-08 21:48:08| 87  9  3  0  0  0|  0 Â2840k|  0   0 |  0   0 |2558 Â4164 |  0 Â7934M| 597M 58.0M Â632M 6507M|  0  Â96k  57k  51k
> > > > > > 21-08 21:48:09| 93  7  1  0  0  0|  0 Â3032k|  0   0 |  0   0 |1329 Â1512 |  0 Â7934M| 641M 58.0M Â626M 6469M|  0  Â61k  48k  39k
> > > > > > 21-08 21:48:10| 93  6  0  0  0  0|  0 Â4984k|  0   0 |  0   0 |1160 Â1146 |  0 Â7934M| 572M 58.0M Â628M 6536M|  0  Â50k  40k  57k
> > > > > > 21-08 21:48:11| 86  9  6  0  0  0|  0 Â2520k|  0   0 |  0   0 |2947 Â4760 |  0 Â7934M| 605M 58.0M Â631M 6500M|  0  103k  55k  45k
> > > > > > 21-08 21:48:12| 90  8  2  0  0  0|  0 Â2840k|  0   0 |  0   0 |2674 Â4179 |  0 Â7934M| 671M 58.0M Â635M 6431M|  0  Â84k  59k  42k
> > > > > > 21-08 21:48:13| 90  9  1  0  0  0|  0 Â4656k|  0   0 |  0   0 |1223 Â1410 |  0 Â7934M| 643M 58.0M Â638M 6455M|  0  Â90k  62k  68k
> > > > > > 21-08 21:48:14| 91  8  1  0  0  0|  0 Â3572k|  0   0 |  0   0 |1432 Â1828 |  0 Â7934M| 647M 58.0M Â641M 6447M|  0  Â81k  59k  57k
> > > > > > 21-08 21:48:15| 91  8  1  0  0  0|  0 Â5116k| 116B  Â0 |  0   0 |1194 Â1295 |  0 Â7934M| 605M 58.0M Â644M 6487M|  0  Â69k  54k  64k
> > > > > > 21-08 21:48:16| 87 Â10  3  0  0  0|  0 Â5140k|  0   0 |  0   0 |1761 Â2586 |  0 Â7934M| 584M 58.0M Â650M 6502M|  0  105k  64k  68k
> > > > > >
> > > > > > The abnormal case compiled only 182 object file in 6 and a half minutes,
> > > > > > then I stopped it. The same dstat output for this case:
> > > > > > ----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
> > > > > >   time   |usr sys idl wai hiq siq| read Âwrit| recv Âsend| Âin  out | int  csw | used Âfree| used Âbuff Âcach Âfree|majpf minpf alloc Âfree
> > > > > > 21-08 22:10:49| 27 Â62 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | 210B Â Â0 | Â 0 Â Â 0 |1414 Â3137k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â1628 Â1250 Â 322
> > > > > > 21-08 22:10:50| 25 Â60 Â 4 Â 0 Â11 Â 0| Â 0 Â Â88k| 126B Â Â0 | Â 0 Â Â 0 |1337 Â3110k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â Â91 Â 128 Â 115
> > > > > > 21-08 22:10:51| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â 184k| 294B Â Â0 | Â 0 Â Â 0 |1411 Â3147k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â1485 Â 814 Â 815
> > > > > > 21-08 22:10:52| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | 437B Â239B| Â 0 Â Â 0 |1355 Â3160k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â Â24 Â Â94 Â Â97
> > > > > > 21-08 22:10:53| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | 168B Â Â0 | Â 0 Â Â 0 |1397 Â3155k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â 479 Â 285 Â 273
> > > > > > 21-08 22:10:54| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â4096B| 396B Â324B| Â 0 Â Â 0 |1346 Â3154k| Â 0 Â7934M| 531M 57.6M Â595M 6611M| Â 0 Â Â27 Â 145 Â 145
> > > > > > 21-08 22:10:55| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â60k| Â 0 Â Â 0 | Â 0 Â Â 0 |1353 Â3148k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â93 Â 117 Â Â36
> > > > > > 21-08 22:10:56| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1341 Â3172k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â 158 Â Â87 Â Â74
> > > > > > 21-08 22:10:57| 26 Â62 Â 1 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â42B Â 60B| Â 0 Â Â 0 |1332 Â3162k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â56 Â Â82 Â Â78
> > > > > > 21-08 22:10:58| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1334 Â3178k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â26 Â Â56 Â Â56
> > > > > > 21-08 22:10:59| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1336 Â3179k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â 3 Â Â33 Â Â32
> > > > > > 21-08 22:11:00| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â24k| Â90B Â108B| Â 0 Â Â 0 |1347 Â3172k| Â 0 Â7934M| 531M 57.6M Â595M 6610M| Â 0 Â Â41 Â Â73 Â Â71
> > > > > >
> > > > > > I have four logical cores so 25% makes up 1 core. I don't know if the ~26% user CPU usage has anthing to do with this fact or just coincidence. The rest is ~63% system and ~11% hardware interrupt. Do these support what you suspect?
> > > > >
> > > > > The massive increase in context switches does come as a bit of a surprise!
> > > > > It does rule out my initial suspicion of lock contention, but then again
> > > > > the fact that you have only four CPUs made that pretty unlikely to begin
> > > > > with.
> > > > >
> > > > > 2.4k average context switches in the good case for the full run vs. 3,156k
> > > > > for about half of a run in the bad case. That is an increase of more
> > > > > than three orders of magnitude!
> > > > >
> > > > > Yow!!!
> > > > >
> > > > > Page faults are actually -higher- in the good case. You have about 6.5GB
> > > > > free in both cases, so you are not running out of memory. Lots more disk
> > > > > writes in the good case, perhaps consistent with its getting more done.
> > > > > Networking is negligible in both cases.
> > > > >
> > > > > Lots of hardware interrupts in the bad case as well. Would you be willing
> > > > > to take a look at /proc/interrupts before and after to see which one you
> > > > > are getting hit with? (Or whatever interrupt tracking tool you prefer.)
> > > >
> > > > Here are the results.
> > > >
> > > > Good case before:
> > > > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > > > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > > > Â1:    Â356     Â1     68     Â4  IO-APIC-edge   Âi8042
> > > > Â8:     Â0     Â0     Â1     Â0  IO-APIC-edge   Ârtc0
> > > > Â9:    Â330     14    Â449     71  IO-APIC-fasteoi  acpi
> > > > 12:     10    Â108    Â269    2696  IO-APIC-edge   Âi8042
> > > > 16:     36     10    Â111     Â2  IO-APIC-fasteoi  ehci_hcd:usb1
> > > > 17:     20     Â3     25     Â4  IO-APIC-fasteoi  mmc0
> > > > 21:     Â3     Â0     34     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > > > 40:     Â0     Â1     12     11  PCI-MSI-edge   Âmei_me
> > > > 41:   Â10617    Â173    9959    Â292  PCI-MSI-edge   Âahci
> > > > 42:    Â862     11    Â186     26  PCI-MSI-edge   Âxhci_hcd
> > > > 43:    Â107     77     27    Â102  PCI-MSI-edge   Âi915
> > > > 44:    5322     20    Â434     22  PCI-MSI-edge   Âiwlwifi
> > > > 45:    Â180     Â0    Â183     86  PCI-MSI-edge   Âsnd_hda_intel
> > > > 46:     Â0     Â3     Â0     Â0  PCI-MSI-edge   Âeth0
> > > > NMI: Â Â Â Â Â1 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Non-maskable interrupts
> > > > LOC: Â Â Â16312 Â Â Â15177 Â Â Â10840 Â Â Â 8995 Â Local timer interrupts
> > > > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > > > PMI: Â Â Â Â Â1 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Performance monitoring interrupts
> > > > IWI: Â Â Â 1160 Â Â Â Â523 Â Â Â 1031 Â Â Â Â481 Â IRQ work interrupts
> > > > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > > > RES: Â Â Â14976 Â Â Â16135 Â Â Â 9973 Â Â Â10784 Â Rescheduling interrupts
> > > > CAL: Â Â Â Â482 Â Â Â Â457 Â Â Â Â151 Â Â Â Â370 Â Function call interrupts
> > > > TLB: Â Â Â Â 70 Â Â Â Â106 Â Â Â Â352 Â Â Â Â230 Â TLB shootdowns
> > > > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > > > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > > > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > > > MCP: Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Machine check polls
> > > > ERR: Â Â Â Â Â0
> > > > MIS: Â Â Â Â Â0
> > > >
> > > > Good case after:
> > > > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > > > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > > > Â1:    Â367     Â1     81     Â4  IO-APIC-edge   Âi8042
> > > > Â8:     Â0     Â0     Â1     Â0  IO-APIC-edge   Ârtc0
> > > > Â9:    Â478     14    Â460     71  IO-APIC-fasteoi  acpi
> > > > 12:     10    Â108    Â269    2696  IO-APIC-edge   Âi8042
> > > > 16:     36     10    Â111     Â2  IO-APIC-fasteoi  ehci_hcd:usb1
> > > > 17:     20     Â3     25     Â4  IO-APIC-fasteoi  mmc0
> > > > 21:     Â3     Â0     34     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > > > 40:     Â0     Â1     12     11  PCI-MSI-edge   Âmei_me
> > > > 41:   Â16888    Â173    9959    Â292  PCI-MSI-edge   Âahci
> > > > 42:    1102     11    Â186     26  PCI-MSI-edge   Âxhci_hcd
> > > > 43:    Â107    Â132     27    Â136  PCI-MSI-edge   Âi915
> > > > 44:    6943     20    Â434     22  PCI-MSI-edge   Âiwlwifi
> > > > 45:    Â180     Â0    Â183     86  PCI-MSI-edge   Âsnd_hda_intel
> > > > 46:     Â0     Â3     Â0     Â0  PCI-MSI-edge   Âeth0
> > > > NMI: Â Â Â Â Â4 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Non-maskable interrupts
> > > > LOC: Â Â Â26845 Â Â Â24780 Â Â Â19025 Â Â Â17746 Â Local timer interrupts
> > > > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > > > PMI: Â Â Â Â Â4 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Performance monitoring interrupts
> > > > IWI: Â Â Â 1637 Â Â Â Â751 Â Â Â 1287 Â Â Â Â695 Â IRQ work interrupts
> > > > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > > > RES: Â Â Â26511 Â Â Â26673 Â Â Â18791 Â Â Â20194 Â Rescheduling interrupts
> > > > CAL: Â Â Â Â510 Â Â Â Â480 Â Â Â Â151 Â Â Â Â370 Â Function call interrupts
> > > > TLB: Â Â Â Â361 Â Â Â Â292 Â Â Â Â575 Â Â Â Â461 Â TLB shootdowns
> > > > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > > > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > > > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > > > MCP: Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Â Â Â Â2 Â Machine check polls
> > > > ERR: Â Â Â Â Â0
> > > > MIS: Â Â Â Â Â0
> > > >
> > > > Bad case before:
> > > > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > > > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > > > Â1:    Â172     Â3     78     Â3  IO-APIC-edge   Âi8042
> > > > Â8:     Â0     Â1     Â0     Â0  IO-APIC-edge   Ârtc0
> > > > Â9:    1200    Â148    Â395     81  IO-APIC-fasteoi  acpi
> > > > 12:    1625     Â2    Â348     10  IO-APIC-edge   Âi8042
> > > > 16:     26     23    Â115     Â0  IO-APIC-fasteoi  ehci_hcd:usb1
> > > > 17:     16     Â3     12     21  IO-APIC-fasteoi  mmc0
> > > > 21:     Â2     Â2     33     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > > > 40:     Â0     Â0     14     10  PCI-MSI-edge   Âmei_me
> > > > 41:   Â15776    Â374    8497    Â687  PCI-MSI-edge   Âahci
> > > > 42:    1297    Â829    Â115     24  PCI-MSI-edge   Âxhci_hcd
> > > > 43:    Â103    Â149     Â9    Â212  PCI-MSI-edge   Âi915
> > > > 44:   Â13151    Â101    Â511     91  PCI-MSI-edge   Âiwlwifi
> > > > 45:    Â153    Â159     Â0    Â122  PCI-MSI-edge   Âsnd_hda_intel
> > > > 46:     Â0     Â1     Â1     Â0  PCI-MSI-edge   Âeth0
> > > > NMI: Â Â Â Â 32 Â Â Â Â 31 Â Â Â Â 31 Â Â Â Â 31 Â Non-maskable interrupts
> > > > LOC: Â Â Â82504 Â Â Â82732 Â Â Â74172 Â Â Â75985 Â Local timer interrupts
> > > > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > > > PMI: Â Â Â Â 32 Â Â Â Â 31 Â Â Â Â 31 Â Â Â Â 31 Â Performance monitoring interrupts
> > > > IWI: Â Â Â17816 Â Â Â16278 Â Â Â13833 Â Â Â13282 Â IRQ work interrupts
> > > > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > > > RES: Â Â Â18784 Â Â Â21084 Â Â Â13313 Â Â Â12946 Â Rescheduling interrupts
> > > > CAL: Â Â Â Â393 Â Â Â Â422 Â Â Â Â306 Â Â Â Â356 Â Function call interrupts
> > > > TLB: Â Â Â Â231 Â Â Â Â176 Â Â Â Â235 Â Â Â Â191 Â TLB shootdowns
> > > > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > > > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > > > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > > > MCP: Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Machine check polls
> > > > ERR: Â Â Â Â Â0
> > > > MIS: Â Â Â Â Â0
> > > >
> > > > Bad case after:
> > > > Â Â Â Â Â CPU0 Â Â Â CPU1 Â Â Â CPU2 Â Â Â CPU3 Â Â Â
> > > > Â0:     17     Â0     Â0     Â0  IO-APIC-edge   Âtimer
> > > > Â1:    Â415     Â3     85     Â3  IO-APIC-edge   Âi8042
> > > > Â8:     Â0     Â1     Â0     Â0  IO-APIC-edge   Ârtc0
> > > > Â9:    1277    Â148    Â428     81  IO-APIC-fasteoi  acpi
> > > > 12:    1625     Â2    Â348     10  IO-APIC-edge   Âi8042
> > > > 16:     26     23    Â115     Â0  IO-APIC-fasteoi  ehci_hcd:usb1
> > > > 17:     16     Â3     12     21  IO-APIC-fasteoi  mmc0
> > > > 21:     Â2     Â2     33     Â0  IO-APIC-fasteoi  ehci_hcd:usb2
> > > > 40:     Â0     Â0     14     10  PCI-MSI-edge   Âmei_me
> > > > 41:   Â17814    Â374    8497    Â687  PCI-MSI-edge   Âahci
> > > > 42:    1567    Â829    Â115     24  PCI-MSI-edge   Âxhci_hcd
> > > > 43:    Â103    Â177     Â9    Â242  PCI-MSI-edge   Âi915
> > > > 44:   Â14956    Â101    Â511     91  PCI-MSI-edge   Âiwlwifi
> > > > 45:    Â153    Â159     Â0    Â122  PCI-MSI-edge   Âsnd_hda_intel
> > > > 46:     Â0     Â1     Â1     Â0  PCI-MSI-edge   Âeth0
> > > > NMI: Â Â Â Â 36 Â Â Â Â 35 Â Â Â Â 34 Â Â Â Â 34 Â Non-maskable interrupts
> > > > LOC: Â Â Â92429 Â Â Â92708 Â Â Â81714 Â Â Â84071 Â Local timer interrupts
> > > > SPU: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Spurious interrupts
> > > > PMI: Â Â Â Â 36 Â Â Â Â 35 Â Â Â Â 34 Â Â Â Â 34 Â Performance monitoring interrupts
> > > > IWI: Â Â Â22594 Â Â Â19658 Â Â Â17439 Â Â Â14257 Â IRQ work interrupts
> > > > RTR: Â Â Â Â Â3 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â APIC ICR read retries
> > > > RES: Â Â Â21491 Â Â Â24670 Â Â Â14618 Â Â Â14569 Â Rescheduling interrupts
> > > > CAL: Â Â Â Â441 Â Â Â Â439 Â Â Â Â306 Â Â Â Â356 Â Function call interrupts
> > > > TLB: Â Â Â Â232 Â Â Â Â181 Â Â Â Â274 Â Â Â Â465 Â TLB shootdowns
> > > > TRM: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Thermal event interrupts
> > > > THR: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Threshold APIC interrupts
> > > > MCE: Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Â Â Â Â0 Â Machine check exceptions
> > > > MCP: Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Â Â Â Â3 Â Machine check polls
> > > > ERR: Â Â Â Â Â0
> > > > MIS: Â Â Â Â Â0
> > >
> > > Lots more local timer interrupts, which is consistent with the higher
> > > time in interrupt handlers for the bad case.
> > >
> > > > > One hypothesis is that your workload and configuration are interacting
> > > > > with RCU_FAST_NO_HZ to force very large numbers of RCU grace periods.
> > > > > Could you please check for this by building with CONFIG_RCU_TRACE=y,
> > > > > mounting debugfs somewhere and dumping rcu/rcu_sched/rcugp before and
> > > > > after each run?
> > > >
> > > > Good case before:
> > > > completed=8756 Âgpnum=8757 Âage=0 Âmax=21
> > > > after:
> > > > completed=14686 Âgpnum=14687 Âage=0 Âmax=21
> > > >
> > > > Bad case before:
> > > > completed=22970 Âgpnum=22971 Âage=0 Âmax=21
> > > > after:
> > > > completed=26110 Âgpnum=26111 Âage=0 Âmax=21
> > >
> > > In the good case, (14686-8756)/40=148.25 grace periods per second, which
> > > is a fast but reasonable rate given your HZ=250. Not a large difference
> > > in the number of grace periods, but extrapolating for the longer runtime,
> > > maybe ten times as much. But not much change in grace-period rate per
> > > unit time.
> > >
> > > > The test scenario was the following in both cases (mixed english and pseudo-bash):
> > > > reboot, login, start terminal
> > > > cd project
> > > > rm -r build
> > > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> > > > scons -j4
> > > > wait ~40 sec (good case finished, Ctrl-C in bad case)
> > > > cat /proc/interrupts >> somefile ; cat /sys/kernel/debug/rcu/rcu_sched/rcugp >> somefile
> > > >
> > > > I stopped the build in the bad case after about the same time the good
> > > > case finished, so the extra interrupts and RCU grace periods because of the
> > > > longer runtime don't fake the results.
> > >
> > > That procedure works for me, thank you for laying it out carefully.
> > >
> > > I believe I see what is going on and how to fix it, though it may take
> > > me a bit to work things through and get a good patch.
> > >
> > > Thank you very much for your testing efforts!
> >
> > I'm glad I can help. I've been using Linux for many years, now I have a
> > chance to help the community, to do something in return. I'm quite
> > enjoying this :)
>
> ;-)
>
> Here is a patch that is more likely to help. I am testing it in parallel,
> but figured I should send you a sneak preview.

I tried it, but I don't see any difference in overall performance. The dstat
also shows the same as before.

But I did notice something. Occasionally there is an increase in userspace
CPU usage, interrupts and context switches are dropping, and it really gets
more work done (scons printed commands more frequently). ÂI checked that
this behaviour is present without your patch, I just didn't notice this
before. Maybe you can make some sense out of it.

----system---- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----swap--- ------memory-usage----- -----virtual-memory----
  time   |usr sys idl wai hiq siq| read Âwrit| recv Âsend| Âin  out | int  csw | used Âfree| used Âbuff Âcach Âfree|majpf minpf alloc Âfree
27-08 20:51:53| 23 Â62 Â 5 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1274 Â3102k| Â 0 Â7934M| 549M 56.0M Â491M 6698M| Â 0 Â Â28 Â 156 Â 159
27-08 20:51:54| 24 Â64 Â 1 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1317 Â3165k| Â 0 Â7934M| 549M 56.0M Â491M 6698M| Â 0 Â Â53 Â 189 Â 182
27-08 20:51:55| 33 Â50  6  2  9  0| 192k 1832k|  0   0 |  0   0 |1371 Â2442k|  0 Â7934M| 544M 56.0M Â492M 6702M|  0  Â30k  17k  17k
27-08 20:51:56| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1313 Â3220k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â21 Â 272 Â 232
27-08 20:51:57| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1319 Â3226k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â 8 Â Â96 Â 112
27-08 20:51:58| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1317 Â3224k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â12 Â 145 Â 141
27-08 20:51:59| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1317 Â3223k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â54 Â 193 Â 191
27-08 20:52:00| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â24k| Â 0 Â Â 0 | Â 0 Â Â 0 |1336 Â3216k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â36 Â 161 Â 172
27-08 20:52:01| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1313 Â3225k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â 9 Â 107 Â 107
27-08 20:52:02| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1327 Â3224k| Â 0 Â7934M| 545M 56.0M Â492M 6701M| Â 0 Â Â13 Â 193 Â 200
27-08 20:52:03| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1311 Â3226k| Â 0 Â7934M| 545M 56.0M Â492M 6701M| Â 0 Â Â13 Â 114 Â 114
27-08 20:52:04| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1331 Â3223k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â41 Â 190 Â 178
27-08 20:52:05| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â8192B| Â 0 Â Â 0 | Â 0 Â Â 0 |1315 Â3222k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â30 Â 123 Â 122
27-08 20:52:06| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1314 Â3223k| Â 0 Â7934M| 544M 56.0M Â492M 6701M| Â 0 Â Â16 Â 187 Â 195
27-08 20:52:07| 25 Â63 Â 1 Â 0 Â12 Â 0|2212k Â192k| Â 0 Â Â 0 | Â 0 Â Â 0 |1637 Â3194k| Â 0 Â7934M| 544M 56.2M Â494M 6699M| Â 0 Â1363 Â2590 Â1947
27-08 20:52:08| 17 Â33 Â18 Â26  6  0|3208k  Â0 |  0   0 |  0   0 |1351 Â1766k|  0 Â7934M| 561M 56.3M Â499M 6678M|  4  Â10k 7620 Â2055
27-08 20:52:09| 47 Â21 Â16 Â13  4  0|4332k Â628k|  0   0 |  0   0 |1400 Â1081k|  0 Â7934M| 647M 56.3M Â504M 6587M| Â10  Â24k  25k 1151
27-08 20:52:10| 36 Â34 Â13 Â11  6  0|2636k 2820k|  0   0 |  0   0 |1451 Â1737k|  0 Â7934M| 598M 56.3M Â507M 6632M|  5  Â19k  16k  28k
27-08 20:52:11| 46 Â17 Â10 Â25 Â 3 Â 0|4288k Â536k| Â 0 Â Â 0 | Â 0 Â Â 0 |1386 Â 868k| Â 0 Â7934M| 613M 56.3M Â513M 6611M| Â24 Â Â13k 8908 Â3616
27-08 20:52:12| 53 Â33  5  4  5  0|4740k 3992k|  0   0 |  0   0 |1773 Â1464k|  0 Â7934M| 562M 56.6M Â521M 6654M|  0  Â36k  29k  40k
27-08 20:52:13| 60 Â34  0  1  6  0|4228k 1416k|  0   0 |  0   0 |1642 Â1670k|  0 Â7934M| 593M 56.6M Â526M 6618M|  0  Â36k  26k  17k
27-08 20:52:14| 53 Â37 Â 1 Â 3 Â 7 Â 0|3008k 1976k| Â 0 Â Â 0 | Â 0 Â Â 0 |1513 Â1972k| Â 0 Â7934M| 541M 56.6M Â529M 6668M| Â 0 Â Â10k 9986 Â Â23k
27-08 20:52:15| 55 Â34  1  4  6  0|3636k 1284k|  0   0 |  0   0 |1645 Â1688k|  0 Â7934M| 581M 56.6M Â535M 6622M|  0  Â43k  30k  18k
27-08 20:52:16| 57 Â30  5  2  5  0|4404k 2320k|  0   0 |  0   0 |1715 Â1489k|  0 Â7934M| 570M 56.6M Â541M 6627M|  0  Â39k  24k  26k
27-08 20:52:17| 50 Â35  3  7  6  0|2520k 1972k|  0   0 |  0   0 |1699 Â1598k|  0 Â7934M| 587M 56.6M Â554M 6596M|  0  Â65k  40k  32k
27-08 20:52:18| 52 Â40  2  1  7  0|1556k 1732k|  0   0 |  0   0 |1582 Â1865k|  0 Â7934M| 551M 56.6M Â567M 6619M|  0  Â35k  26k  33k
27-08 20:52:19| 26 Â62 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1351 Â3240k| Â 0 Â7934M| 551M 56.6M Â568M 6619M| Â 0 Â Â86 Â 440 Â 214
27-08 20:52:20| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â 108k| Â 0 Â Â 0 | Â 0 Â Â 0 |1392 Â3162k| Â 0 Â7934M| 555M 56.6M Â560M 6623M| Â 0 Â1801 Â1490 Â2672
27-08 20:52:21| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1332 Â3198k| Â 0 Â7934M| 555M 56.6M Â560M 6623M| Â 0 Â Â50 Â 245 Â 255
27-08 20:52:22| 25 Â63 Â 1 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1350 Â3220k| Â 0 Â7934M| 556M 56.6M Â560M 6622M| Â 0 Â 755 Â 544 Â 286
27-08 20:52:23| 27 Â62 Â 1 Â 0 Â11 Â 0| Â 0 Â 272k| Â 0 Â Â 0 | Â 0 Â Â 0 |1323 Â3092k| Â 0 Â7934M| 551M 56.6M Â558M 6628M| Â 0 Â 341 Â1464 Â3085
27-08 20:52:24| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1334 Â3197k| Â 0 Â7934M| 551M 56.6M Â559M 6627M| Â 0 Â Â63 Â 514 Â 273
27-08 20:52:25| 25 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â40k| Â 0 Â Â 0 | Â 0 Â Â 0 |1329 Â3243k| Â 0 Â7934M| 546M 56.6M Â558M 6633M| Â 0 Â 321 Â 160 Â1679
27-08 20:52:26| 39 Â50  1  1  9  0| Â48k Â644k|  0   0 |  0   0 |1500 Â2556k|  0 Â7934M| 552M 56.6M Â560M 6625M|  0  Â30k  14k  12k
27-08 20:52:27| 26 Â62 Â 1 Â 0 Â11 Â 0| Â 0 Â 192k| Â 0 Â Â 0 | Â 0 Â Â 0 |1380 Â3152k| Â 0 Â7934M| 553M 56.6M Â560M 6624M| Â 0 Â2370 Â 808 Â 718
27-08 20:52:28| 23 Â55 Â12 Â 0 Â10 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1247 Â2993k| Â 0 Â7934M| 553M 56.6M Â561M 6624M| Â 0 Â1060 Â 428 Â 241
27-08 20:52:29| 25 Â63 Â 1 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1318 Â3142k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â 663 Â 442 Â 198
27-08 20:52:30| 25 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â 100k| Â 0 Â Â 0 | Â 0 Â Â 0 |1316 Â3212k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â Â42 Â 187 Â 186
27-08 20:52:31| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â4096B| Â 0 Â Â 0 | Â 0 Â Â 0 |1309 Â3222k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â Â 9 Â Â85 Â Â85
27-08 20:52:32| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1297 Â3219k| Â 0 Â7934M| 554M 56.6M Â561M 6623M| Â 0 Â Â23 Â Â95 Â Â84
27-08 20:52:33| 25 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1339 Â3101k| Â 0 Â7934M| 555M 56.6M Â557M 6625M| Â 0 Â 923 Â1566 Â2176
27-08 20:52:34| 25 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1333 Â3095k| Â 0 Â7934M| 555M 56.6M Â559M 6623M| Â 0 Â 114 Â 701 Â 195
27-08 20:52:35| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1342 Â3036k| Â 0 Â7934M| 557M 56.6M Â568M 6613M| Â 0 Â 957 Â3225 Â 516
27-08 20:52:36| 26 Â60 Â 2 Â 0 Â11 Â 0| Â 0 Â Â28k| Â 0 Â Â 0 | Â 0 Â Â 0 |1340 Â3091k| Â 0 Â7934M| 555M 56.6M Â568M 6614M| Â 0 Â5304 Â5422 Â5609
27-08 20:52:37| 25 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1348 Â3260k| Â 0 Â7934M| 556M 56.6M Â565M 6617M| Â 0 Â Â30 Â 156 Â1073
27-08 20:52:38| 24 Â64 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1314 Â3211k| Â 0 Â7934M| 556M 56.6M Â549M 6633M| Â 0 Â Â11 Â 105 Â4285
27-08 20:52:39| 26 Â63 Â 0 Â 0 Â11 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1353 Â3031k| Â 0 Â7934M| 558M 56.6M Â559M 6620M| Â 0 Â 847 Â3866 Â 357
27-08 20:52:40| 26 Â63 Â 0 Â 0 Â12 Â 0| Â 0 Â Â 0 | Â 0 Â Â 0 | Â 0 Â Â 0 |1309 Â3135k| Â 0 Â7934M| 569M 56.6M Â566M 6602M| Â 0 Â3940 Â5727 Â1288

Tibor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/