RE: [RESEND PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat counter

From: PINTU KUMAR
Date: Thu Oct 15 2015 - 10:35:33 EST


Hi,

> -----Original Message-----
> From: David Rientjes [mailto:rientjes@xxxxxxxxxx]
> Sent: Thursday, October 15, 2015 3:35 AM
> To: PINTU KUMAR
> Cc: akpm@xxxxxxxxxxxxxxxxxxxx; minchan@xxxxxxxxxx; dave@xxxxxxxxxxxx;
> mhocko@xxxxxxx; koct9i@xxxxxxxxx; hannes@xxxxxxxxxxx; penguin-kernel@i-
> love.sakura.ne.jp; bywxiaobai@xxxxxxx; mgorman@xxxxxxx; vbabka@xxxxxxx;
> js1304@xxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx;
> alexander.h.duyck@xxxxxxxxxx; sasha.levin@xxxxxxxxxx; cl@xxxxxxxxx;
> fengguang.wu@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> cpgs@xxxxxxxxxxx; pintu_agarwal@xxxxxxxxx; pintu.ping@xxxxxxxxx;
> vishnu.ps@xxxxxxxxxxx; rohit.kr@xxxxxxxxxxx; c.rajkumar@xxxxxxxxxxx
> Subject: RE: [RESEND PATCH 1/1] mm: vmstat: Add OOM victims count in vmstat
> counter
>
> On Wed, 14 Oct 2015, PINTU KUMAR wrote:
>
> > For me it was very helpful during sluggish and long duration ageing tests.
> > With this, I don't have to look into the logs manually.
> > I just monitor this count in a script.
> > The moment I get nr_oom_victims > 1, I know that kernel OOM would have
> > happened and I need to take the log dump.
> > So, then I do: dmesg >> oom_logs.txt
> > Or, even stop the tests for further tuning.
> >
>
> I think eventfd(2) was created for that purpose, to avoid the constant polling
> that you would have to do to check nr_oom_victims and then take a snapshot.
>
> > > I disagree with this one, because we can encounter oom kills due to
> > > fragmentation rather than low memory conditions for high-order
allocations.
> > > The amount of free memory may be substantially higher than all zone
> > > watermarks.
> > >
> > AFAIK, kernel oom happens only for lower-order
> (PAGE_ALLOC_COSTLY_ORDER).
> > For higher-order we get page allocation failure.
> >
>
> Order-3 is included. I've seen machines with _gigabytes_ of free memory in
> ZONE_NORMAL on a node and have an order-3 page allocation failure that
> called the oom killer.
>
Yes, if PAGE_ALLOC_COSTLY_ORDER is defined as 3, then order-3 will be included
for OOM. But that's fine. We are just interested to know if system entered oom
state.
That's the reason, earlier I added even _oom_stall_ to know if system ever
entered oom but resulted into page allocation failure instead of oom killing.

> > > We've long had a desire to have a better oom reporting mechanism
> > > rather than just the kernel log. It seems like you're feeling the
> > > same pain. I think it
> > would be
> > > better to have an eventfd notifier for system oom conditions so we
> > > can track kernel oom kills (and conditions) in userspace. I have a
> > > patch for that, and
> > it
> > > works quite well when userspace is mlocked with a buffer in memory.
> > >
> > Ok, this would be interesting.
> > Can you point me to the patches?
> > I will quickly check if it is useful for us.
> >
>
> https://lwn.net/Articles/589404. It's invasive and isn't upstream. I would
like to
> restructure that patchset to avoid the memcg trickery and allow for a
root-only
> eventfd(2) notification through procfs on system oom.

I am interested only in global oom case and not memcg. We have memcg enabled but
I think even memcg_oom will finally invoke _oom_kill_process_.
So, I am interested in a patchset that can trigger notifications from
oom_kill_process, as soon as any victim is killed.
Sorry, from your patchset, I could not actually local the system_oom
notification patch.
If you have similar patchset please point me to it.
It will be really helpful.
Thank you!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/