Re: PROBLEM: spurious temporary freeze, scheduler and preempt problem ?

From: Eric
Date: Wed Feb 18 2004 - 13:15:19 EST


On Wednesday 18 February 2004 9:26 am, Brice Figureau wrote:
Someone correct me if I'm way off base, but have you tried
scheduler=deadline
To try the different kernel schedulers?
Im not sure about the keyword, but do a quick google or search lkml and you'll
see the correct keyword and the different schedulers you can try.


> Hi all,
>
> I'm working on a java server that uses some native libs (aka JNI). The
> server is poll(2)ing a TCP socket with the help of NIO, and dispatch
> network requests to a pool of a threads (currently 4) that send back
> responses to the client. Those threads are mostly executing native code
> (with JNI). The server is using a mysql engine running on the same host
> to store a few things (about 10 records stored per seconds if that
> matters).
>
> I'm running 2.6.3-rc2 on a UP Celeron (same problem on a PIII too)
> system with plenty of RAM, but which runs several other daemons
> (postfix, djbdns... those daemons were not loading the machine during
> the tests).
> The machine was never swapping (see the vmstat output below [2,3]).
>
> I noticed a few weeks ago that when I try to stress-test this particular
> application the host on which it runs completely freeze for 3 to 5
> seconds, nothing seems to run at all during this time (this was
> especially visible during my first test as it was my devl computer
> running X and the whole screen was frozen).
>
> I decided to perform torough testing running on quiet machine without X.
> During my testbed I tried to capture most data I could (vmstat,
> profiling...).
>
> What I've seen is the following:
>
> 1) During the freeze, the system was using almost 80% CPU time in kernel
> and the remaining in user space
>
> 2) During the freeze, even changing from virtual-console to another
> could have been frozen. No virtual-console would be alive.
>
> 3) a top d 1 or top d 0.5 show that my application was running before
> the freeze and after the freeze, with 80% of CPU time in kernel space.
>
> 4) load was increasing faster than I could think (>19 after 30 seconds)
>
> I first thought that a profiling (with profile=1 or oprofile) would give
> me a good candidate about the problem, but I'm not sure right now (see
> the profile output at the end of the mail [4,5])
>
> After that I tried to run vmstat 1 to have a better idea of the problem
> [2,3]. You can see the high amount of context-switch and interrupt. I
> think that this is because vmstat was stuck (like the other process)
> during a few seconds, but the kernel counters were still incrementing.
>
> Then I finally tried several differents kernel and found the following
> results:
>
> 2.6.3-rc2 -> exhibit the high load problem
> 2.4.25-rc3 -> immune to the problem, the load never goes beyong 1.20
>
> I finally did a binary search between several version of 2.6 and found
> that any kernel *before* 2.6.0-test5 (included) are immune to the
> problem and any kernel *after* 2.6.0-test6 (included) have the same load
> problem.
>
> On a side note, 2.6.0-test4-mm2 (the only mm kernel I had already
> compiled) had the problem too.
>
> So what are the differences between test5 and test6 ?
> It seems that the sched_interactive patch from Con Kolivas was included
> in test6, can it be the cause ?
>
> By running and looking at a 'top d 0.5' I couldn't spot any bizarre
> dynamic priorities (my java app is at 15 or 16, never more).
> The only difference I could spot is that kernel with interactive
> scheduler seemed to make runnable several java threads at the same time
> (about 3 to 4) unlike the previous scheduler (only 1~2 runnable process
> at any time).
>
> Something I should also say is that I tested a 2.6.3-rc2 *without*
> preempt and found this kernel to be immune, too.
>
> I also removed all the sql code in my application and no more freeze !
> (Although that does not explain why the combination of the two gives
> those results).
>
> Can someone help me find what the problem is ?
>
> Useful data:
>
> 1) ver_linux
> Linux arsenic 2.6.3-rc2 #2 SMP Wed Feb 18 11:35:24 CET 2004 i686 unknown
> unknown GNU/Linux
>
> Gnu C 3.3.1
> Gnu make 3.80
> util-linux 2.12
> mount 2.11z
> module-init-tools 0.9.11a
> e2fsprogs 1.34
> quota-tools 3.09.
> PPP 2.4.1
> nfs-utils 1.0.5
> Linux C Library 2.3.2
> Dynamic linker (ldd) 2.3.2
> Procps 3.1.11
> Net-tools 1.60
> Console-tools 0.2.3
> Sh-utils 5.0
> Modules Loaded
>
>
> 2) vmstat for a kernel with the problem:
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu---- r b swpd free buff cache si so bi bo in
> cs us sy id wa 3 0 0 371136 8044 61096 0 0 16 120 1035
> 351 63 3 30 4 3 0 0 371136 8044 61112 0 0 12 0
> 1033 532 78 4 18 0 21 0 0 371024 8064 61116 0 0 0
> 148 9038 630633 29 71 0 0 4 0 0 371024 8064 61120 0 0
> 0 0 1064 43014 58 42 0 0 4 0 0 371024 8064 61120 0 0
> 0 0 1159 78750 38 62 0 0 4 0 0 371040 8080 61120 0 0
> 0 76 4298 309703 29 71 0 0 15 0 0 370976 8096 61120 0
> 0 0 96 6833 501193 27 73 0 0 3 0 0 370992 8096 61124
> 0 0 0 0 1094 44694 56 44 0 0 3 0 0 370992 8096 61124
> 0 0 0 0 1087 65851 42 58 0 0 16 0 0 371024 8112
> 61124 0 0 0 76 10511 794459 26 74 0 0 4 0 0 371024
> 8112 61124 0 0 0 0 1035 56591 50 50 0 0 3 0 0 371024
> 8112 61124 0 0 0 0 1025 63930 41 59 0 0 3 0 0
> 371024 8112 61124 0 0 0 0 1007 54424 47 53 0 0 5 0
> 0 371024 8112 61124 0 0 0 0 1100 56909 53 47 0 0 5 0
> 0 370960 8128 61128 0 0 0 60 1154 71518 37 63 0 0 4 0
> 0 370960 8128 61128 0 0 0 0 1041 54728 51 49 0 0 12
> 0 0 370960 8128 61128 0 0 0 12 3147 223442 31 69 0 0
>
> huge freeze before the following line
> 15 0 0 370952 8152 61128 0 0 0 112 13393 1014908 25 75
> 0 0 3 0 0 370944 8152 61128 0 0 0 0 1006 55046 43
> 57 0 0 3 0 0 370944 8152 61128 0 0 0 0 1111 66487
> 40 60 0 0 21 0 0 370920 8168 61128 0 0 0 68 20308
> 1531575 26 74 0 0 procs -----------memory---------- ---swap-- -----io----
> --system-- ----cpu---- r b swpd free buff cache si so bi
> bo in cs us sy id wa 14 0 0 370888 8176 61132 0 0 0
> 20 4228 313614 25 75 0 0 18 0 0 370912 8200 61132 0 0
> 0 112 8133 610932 26 74 0 0 8 0 0 371488 8200 61132 0
> 0 0 48 1513 104797 31 69 0 0 29 0 0 371440 8216 61132
> 0 0 0 68 4490 333022 26 74 0 0
>
> 3) vmstat for a kernel without the problem
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu---- r b swpd free buff cache si so bi bo in
> cs us sy id wa 0 0 0 389060 7884 57472 0 0 153 38 1023
> 91 5 3 85 7 2 0 0 388020 7884 57492 0 0 4 0
> 1012 127 3 3 92 2 2 0 0 380940 7884 57492 0 0 0
> 0 1010 440 94 6 0 0 0 1 0 378636 7928 58460 0 0 1004
> 16 1173 330 47 5 21 26 0 0 0 378276 7928 58600 0 0 136
> 0 1038 291 68 10 11 11 3 0 0 377828 7944 58652 0 0
> 52 268 1021 261 68 10 17 5 3 0 0 377188 7952 58768 0 0
> 112 24 1017 278 63 7 24 6 2 0 0 377060 7952 58828 0
> 0 56 0 1016 276 57 7 27 9 3 0 0 377060 7960 58864 0
> 0 36 88 1020 223 65 8 23 4 0 0 0 376868 7960 58896
> 0 0 28 0 1009 234 66 10 23 1 3 0 0 376820 7968 58936
> 0 0 36 104 1032 229 61 9 26 4 3 0 0 376756 7968
> 58972 0 0 32 0 1010 274 47 6 43 3 0 0 0 376564
> 7968 59000 0 0 28 0 1009 415 66 3 30 1 5 0 0 376308
> 7976 59012 0 0 8 92 1016 387 56 5 39 0 2 0 0
> 376236 7976 59080 0 0 68 0 1004 290 78 1 21 0 4 0
> 0 375980 7984 59088 0 0 4 204 1056 408 45 5 50 0 1 0
> 0 375788 7984 59092 0 0 4 0 1003 330 87 4 9 0 0 0
> 0 375596 7984 59096 0 0 0 0 1002 281 74 3 23 0 0
> 0 0 375404 7992 59108 0 0 8 120 1024 289 43 4 39 15
> 1 0 0 375276 7992 59124 0 0 12 0 1005 394 39 3 57
> 1 4 0 0 375204 8000 59132 0 0 4 16 1006 1083 95 5
> 0 0 procs -----------memory---------- ---swap-- -----io---- --system--
> ----cpu---- r b swpd free buff cache si so bi bo in
> cs us sy id wa 3 0 0 375140 8000 59132 0 0 0 0 1002
> 95433 37 63 0 0 3 0 0 375060 8000 59136 0 0 0 0
> 1012 102262 33 67 0 0 6 0 0 375044 8008 59136 0 0 0
> 112 1025 83959 43 57 0 0 5 0 0 374988 8008 59136 0 0 0
> 0 1002 232 98 2 0 0 3 0 0 375052 8016 59136 0 0
> 0 20 1005 211 98 2 0 0 3 0 0 375052 8016 59136 0 0
> 0 0 1007 43272 71 29 0 0 2 0 0 375052 8016 59140 0 0
> 0 0 1023 82448 59 41 0 0 3 0 0 374988 8024 59140 0
> 0 0 84 1022 132541 39 61 0 0 3 0 0 374988 8024 59140 0
> 0 0 0 1002 21207 89 11 0 0 5 0 0 375004 8032 59144
> 0 0 0 20 1005 45680 75 25 0 0 3 0 0 375004 8032 59144
> 0 0 0 0 1012 24392 82 18 0 0 3 0 0 375004 8032
> 59144 0 0 0 0 1015 82373 48 52 0 0 2 0 0 375004
> 8032 59144 0 0 0 0 1005 77767 61 39 0 0 4 0 0 375020
> 8040 59144 0 0 0 68 1009 148047 31 69 0 0 3 0 0
> 375020 8048 59144 0 0 0 32 1008 127918 38 62 0 0 1 0
> 0 374956 8048 59144 0 0 0 0 1011 91847 39 61 0 0 3 0
> 0 374956 8048 59148 0 0 0 0 1002 94842 51 49 0 0 4 0
> 0 374972 8048 59148 0 0 0 0 1006 126960 38 62 0 0 2
> 0 0 374972 8056 59148 0 0 0 60 1009 76808 65 35 0 0
> 1 0 0 374972 8064 59148 0 0 0 184 1037 111781 32 68 0
> 0 2 0 0 374972 8064 59148 0 0 0 0 1002 120766 36 64
> 0 0
>
> 4) readprofile for a kernel with the problem (analysis during atmost 1min)
>
> c0191b55 proc_pid_stat 14 0.0108
> c01936cc write_profile 15 0.0794
> c0241e40 do_con_write 15 0.0081
> c0130e95 sys_tgkill 16 0.0438
> c0176dd2 __d_lookup 18 0.0496
> c018cf88 task_statm 18 0.1818
> c02037d2 __copy_user_intel 20 0.1163
> c02025f3 number 28 0.0452
> c014e3a8 do_anonymous_page 32 0.0516
> c012eab4 kill_something_info 175 0.5591
> c0109480 syscall_call 239 21.7273
> c01354ad find_task_by_pid 280 3.6364
> c012dd86 check_kill_permission 316 1.8588
> c010948b syscall_exit 336 30.5455
> c0109f34 device_not_available 360 8.5714
> c0111313 restore_i387_fxsave 403 2.4132
> c01209af __might_sleep 625 3.0340
> c01113ba restore_i387 673 4.6414
> c010870a restore_sigcontext 753 2.4770
> c0130e43 sys_kill 757 9.2317
> c0108a73 setup_sigcontext 935 3.1481
> c0114ffa sched_clock 1189 9.0076
> c0111111 save_i387_fxsave 1214 4.8367
> c011120c save_i387 1872 7.1179
> c0111027 convert_fxsr_from_user 1974 8.4359
> c012ea44 kill_proc_info 2217 19.7946
> c0130613 sys_rt_sigprocmask 2220 5.2857
> c0108b9c setup_frame 2250 4.8077
> c010883a sys_sigreturn 3405 13.3008
> c0110eca convert_fxsr_to_user 3620 10.3725
> c0109009 handle_signal 3927 11.6183
> c01084cc sys_rt_sigsuspend 4121 13.4235
> c020399a __copy_from_user_ll 6233 54.6754
> c010915b do_signal 6656 24.7435
> c020392a __copy_to_user_ll 7219 64.4554
> c01300d4 get_signal_to_deliver 9293 8.6447
> c012e484 group_send_sig_info 14040 13.2578
> c011e5a1 schedule 15029 8.2396
> c0109454 system_call 21609 491.1136
> 00000000 total 114435 0.0412
>
> 5) profile for a kernel without the problem (in fact a 2.6.3-rc2 without
> preempt)
>
> c027983c vgacon_cursor 24 0.0612
> c0108d64 syscall_call 25 2.2727
> c013a306 do_anonymous_page 25 0.0627
> c016e47c pid_revalidate 26 0.1486
> c01d7a86 __copy_user_intel 28 0.1628
> c010f2ed restore_i387_fxsave 31 0.2138
> c0127742 find_task_by_pid 31 0.4026
> c020d598 conv_uni_to_pc 34 0.1932
> c02c2b7f tcp_poll 36 0.0945
> c010f17d save_i387_fxsave 40 0.2062
> c0153114 link_path_walk 43 0.0205
> c016edac get_tgid_list 47 0.3072
> c0147da7 fget_light 49 0.4949
> c015c21f __d_lookup 51 0.2099
> c021308d do_con_write 53 0.0287
> c0123e02 sys_kill 57 0.6951
> c010842a setup_sigcontext 59 0.1987
> c01d6bba vsnprintf 63 0.0510
> c0108144 restore_sigcontext 66 0.2171
> c01702da proc_pid_stat 69 0.0681
> c0116bed __might_sleep 70 0.4516
> c0108553 setup_frame 72 0.1538
> c016c9de task_statm 83 0.8384
> c0112a5e sched_clock 90 0.6818
> c0108274 sys_sigreturn 93 0.4794
> c01236d1 sys_rt_sigprocmask 99 0.2750
> c01089c0 handle_signal 122 0.4342
> c010f093 convert_fxsr_from_user 143 0.6111
> c0107f42 sys_rt_sigsuspend 146 0.5911
> c01d694f number 157 0.2536
> c010ef36 convert_fxsr_to_user 179 0.5129
> c0108ad9 do_signal 463 1.8016
> c01d7c4e __copy_from_user_ll 476 4.1754
> c0123300 get_signal_to_deliver 522 0.6788
> c0121d45 group_send_sig_info 646 0.7242
> c01d7bde __copy_to_user_ll 695 6.2054
> c011577c schedule 889 0.6679
> c0108d38 system_call 1642 37.3182
> c0106d59 default_idle 53370 1241.1628
> 00000000 total 61980 0.0289
>
> 6) Sysreq-T during a freeze available on request.
> 7) .config or other info available on request
>
> Thank you for your help.
> PS: I'm not subscribed to the list, please CC: me if you answer.
>
> --
> Brice Figureau
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
-------------------------
Eric Bambach
Eric at cisu dot net
-------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/