Re: [PATCH for 3.2.34] memcg: do not trigger OOM if PF_NO_MEMCG_OOMis set

From: Michal Hocko
Date: Fri Feb 08 2013 - 07:39:05 EST


On Fri 08-02-13 12:02:49, azurIt wrote:
> >
> >Do you have logs from that time period?
> >
> >I have only glanced through the stacks and most of the threads are
> >waiting in the mem_cgroup_handle_oom (mostly from the page fault path
> >where we do not have other options than waiting) which suggests that
> >your memory limit is seriously underestimated. If you look at the number
> >of charging failures (memory.failcnt per-group file) then you will get
> >9332083 failures in _average_ per group. This is a lot!
> >Not all those failures end with OOM, of course. But it clearly signals
> >that the workload need much more memory than the limit allows.
>
>
> What type of logs? I have all.

kernel log would be sufficient.

> Memory usage graph:
> http://www.watchdog.sk/lkml/memory2.png
>
> New kernel was booted about 1:15. Data in memcg-bug-4.tar.gz were taken about 2:35 and data in memcg-bug-5.tar.gz about 5:25. There was always lots of free memory. Higher memory consumption between 3:39 and 5:33 was caused by data backup and was completed few minutes before i restarted the server (this was just a coincidence).
>
>
>
> >There are only 5 groups in this one and all of them have no memory
> >charged (so no OOM going on). All tasks are somewhere in the ptrace
> >code.
>
>
> It's all from the same cgroup but from different time.
>
>
>
> >grep cache -r .
> >./1360297489/memory.stat:cache 0
> >./1360297489/memory.stat:total_cache 65642496
> >./1360297491/memory.stat:cache 0
> >./1360297491/memory.stat:total_cache 65642496
> >./1360297492/memory.stat:cache 0
> >./1360297492/memory.stat:total_cache 65642496
> >./1360297490/memory.stat:cache 0
> >./1360297490/memory.stat:total_cache 65642496
> >./1360297488/memory.stat:cache 0
> >./1360297488/memory.stat:total_cache 65642496
> >
> >which suggests that this is a parent group and the memory is charged in
> >a child group. I guess that all those are under OOM as the number seems
> >like they have limit at 62M.
>
>
> The cgroup has limit 330M (346030080 bytes).

This limit is for top level groups, right? Those seem to children which
have 62MB charged - is that a limit for those children?

> As i said, these two processes

Which are those two processes?

> were stucked and was impossible to kill them. They were,
> maybe, the processes which i was trying to 'strace' before - 'strace'
> was freezed as always when the cgroup has this problem and i killed it
> (i was just trying if it is the original cgroup problem).

I have no idea what is the strace role here.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/