Re: [PATCH for 3.2] memcg: do not trap chargers with full callstackon OOM

From: Michal Hocko
Date: Wed Jun 19 2013 - 09:26:24 EST


On Mon 17-06-13 12:21:34, azurIt wrote:
> >Here we go. I hope I didn't screw anything (Johannes might double check)
> >because there were quite some changes in the area since 3.2. Nothing
> >earth shattering though. Please note that I have only compile tested
> >this. Also make sure you remove the previous patches you have from me.
>
>
> Hi Michal,
>
> it, unfortunately, didn't work. Everything was working fine but
> original problem is still occuring.

This would be more than surprising because tasks blocked at memcg OOM
don't hold any locks anymore. Maybe I have messed something up during
backport but I cannot spot anything.

> I'm unable to send you stacks or more info because problem is taking
> down the whole server for some time now (don't know what exactly
> caused it to start happening, maybe newer versions of 3.2.x).

So you are not testing with the same kernel with just the old patch
replaced by the new one?

> But i'm sure of one thing - when problem occurs, nothing is able to
> access hard drives (every process which tries it is freezed until
> problem is resolved or server is rebooted).

I would be really interesting to see what those tasks are blocked on.

> Problem is fixed after killing processes from cgroup which
> caused it and everything immediatelly starts to work normally. I
> find this out by keeping terminal opened from another server to one
> where my problem is occuring quite often and running several apps
> there (htop, iotop, etc.). When problem occurs, all apps which wasn't
> working with HDD was ok. The htop proved to be very usefull here
> because it's only reading proc filesystem and is also able to send
> KILL signals - i was able to resolve the problem with it
> without rebooting the server.

sysrq+t will give you the list of all tasks and their traces.

> I created a special daemon (about month ago) which is able to detect
> and fix the problem so i'm not having server outages now. The point
> was to NOT access anything which is stored on HDDs, the daemon is
> only reading info from cgroup filesystem and sending KILL signals to
> processes. Maybe i should be able to also read stack files before
> killing, i will try it.
>
> Btw, which vanilla kernel includes this patch?

None yet. But I hope it will be merged to 3.11 and backported to the
stable trees.

> Thank you and everyone involved very much for time and help.
>
> azur

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/