Re: [patch 0/7] improve memcg oom killer robustness v2

From: azurIt
Date: Wed Sep 04 2013 - 03:54:00 EST


>On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >>Hi azur,
>> >>
>> >>here is the x86-only rollup of the series for 3.2.
>> >>
>> >>Thanks!
>> >>Johannes
>> >>---
>> >
>> >
>> >Johannes,
>> >
>> >unfortunately, one problem arises: I have (again) cgroup which cannot be deleted :( it's a user who had very high memory usage and was reaching his limit very often. Do you need any info which i can gather now?
>
>Did the OOM killer go off in this group?
>



# cat /cgroups/cannot_rm_01/memory.oom_control
oom_kill_disable 0
under_oom 1
#




>Was there a warning in the syslog ("Fixing unhandled memcg OOM
>context")?



Really don't know cos i don't know the exact day when it happens. I just find that out on 30.8. but it could happen anytime before. Uptime on that server is 27 days so maybe i can grep all syslog logs i have if it helps. I just need to find out the original name of that cgroup cos i renamed it to 'cannot_rm_01' so my software will ignore it.



>If it happens again, could you check if there are tasks left in the
>cgroup? And provide /proc/<pid>/stack of the hung task trying to
>delete the cgroup?



# cat /cgroups/cannot_rm_01/tasks
#



>> Now i can definitely confirm that problem is NOT fixed :( it happened again but i don't have any data because i already disabled all debug output.
>
>Which debug output?



Debug output from my own scripts which are suppose to handle this situation and kill frozen processes. I already reactivated it, it is grabbing content of 'stacks' from all processes before killing them.



>Do you still have access to the syslog?



>From that day (30.8.)? Yes.


>It's possible that, as your system does not deadlock on the OOMing
>cgroup anymore, you hit a separate bug...
>
>Thanks!
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/