Re: [patch 0/7] improve memcg oom killer robustness v2

From: azurIt
Date: Tue Sep 17 2013 - 07:20:25 EST


> CC: "Michal Hocko" <mhocko@xxxxxxx>, "Andrew Morton" <akpm@xxxxxxxxxxxxxxxxxxxx>, "David Rientjes" <rientjes@xxxxxxxxxx>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@xxxxxxxxxxxxxx>, "KOSAKI Motohiro" <kosaki.motohiro@xxxxxxxxxxxxxx>, linux-mm@xxxxxxxxx, cgroups@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, linux-arch@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>On Mon, Sep 16, 2013 at 10:52:46PM +0200, azurIt wrote:
>> > CC: "Johannes Weiner" <hannes@xxxxxxxxxxx>, "Andrew Morton" <akpm@xxxxxxxxxxxxxxxxxxxx>, "David Rientjes" <rientjes@xxxxxxxxxx>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@xxxxxxxxxxxxxx>, "KOSAKI Motohiro" <kosaki.motohiro@xxxxxxxxxxxxxx>, linux-mm@xxxxxxxxx, cgroups@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, linux-arch@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>> >On Mon 16-09-13 17:05:43, azurIt wrote:
>> >> > CC: "Johannes Weiner" <hannes@xxxxxxxxxxx>, "Andrew Morton" <akpm@xxxxxxxxxxxxxxxxxxxx>, "David Rientjes" <rientjes@xxxxxxxxxx>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@xxxxxxxxxxxxxx>, "KOSAKI Motohiro" <kosaki.motohiro@xxxxxxxxxxxxxx>, linux-mm@xxxxxxxxx, cgroups@xxxxxxxxxxxxxxx, x86@xxxxxxxxxx, linux-arch@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>> >> >On Mon 16-09-13 16:13:16, azurIt wrote:
>> >> >[...]
>> >> >> >You can use sysrq+l via serial console to see tasks hogging the CPU or
>> >> >> >sysrq+t to see all the existing tasks.
>> >> >>
>> >> >>
>> >> >> Doesn't work here, it just prints 'l' resp. 't'.
>> >> >
>> >> >I am using telnet for accessing my serial consoles exported by
>> >> >the multiplicator or KVM and it can send sysrq via ctrl+t (Send
>> >> >Break). Check your serial console setup.
>> >>
>> >>
>> >>
>> >> I'm using Raritan KVM and i created keyboard macro 'sysrq + l' resp.
>> >> 'sysrq + t'. I'm also unable to use it on my local PC. Maybe it needs
>> >> to be enabled somehow?
>> >
>> >Probably yes. echo 1 > /proc/sys/kernel/sysrq should enable all sysrq
>> >commands. You can select also some of them (have a look at
>> >Documentation/sysrq.txt for more information)
>>
>>
>> Now it happens again and i was just looking on the server's
>> htop. I'm sure that this time it was only one process (apache)
>> running under user account (not root). It was taking about 100% CPU
>> (about 100% of one core). I was able to kill it by hand inside htop
>> but everything was very slow, server load was immediately on
>> 500. I'm sure it must be related to that Johannes kernel patches
>> because i'm also using i/o throttling in cgroups via Block IO
>> controller so users are unable to create such a huge I/O. I will try
>> to take stacks of processes but i'm not able to identify the
>> problematic process so i will have to take them from *all* apache
>> processes while killing them.
>
>It would be fantastic if you could capture those stacks. sysrq+t
>captures ALL of them in one go and drops them into your syslog.
>
>/proc/<pid>/stack for individual tasks works too.



Btw, this is how it looked like:
http://watchdog.sk/lkml/htop2.jpg

azur
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/