Re: [PATCH 0/3] OOM detection rework v4

From: Vlastimil Babka
Date: Wed Mar 02 2016 - 08:22:41 EST


On 03/02/2016 01:24 PM, Michal Hocko wrote:
On Tue 01-03-16 19:14:08, Vlastimil Babka wrote:

I was under impression that similar checks to compaction_suitable() were
done also in compact_finished(), to stop compacting if memory got low due to
parallel activity. But I guess it was a patch from Joonsoo that didn't get
merged.

My only other theory so far is that watermark checks fail in
__isolate_free_page() when we want to grab page(s) as migration targets.

yes this certainly contributes to the problem and triggered in my case a
lot:
$ grep __isolate_free_page trace.log | wc -l
181
$ grep __alloc_pages_direct_compact: trace.log | wc -l
7

I would suggest enabling all compaction tracepoint and the migration
tracepoint. Looking at the trace could hopefully help faster than
going one trace_printk() per attempt.

OK, here we go with both watermarks checks removed and hopefully all the
compaction related tracepoints enabled:
echo 1 > /debug/tracing/events/compaction/enable
echo 1 > /debug/tracing/events/migrate/mm_migrate_pages/enable

The trace shows only 4 direct compaction attempts with order=2. The rest is order=9, i.e. THP, which has little chances of success under such pressure, and thus those failures and defers. The few order=2 attempts appear all successful (defer_reset is called).

So it seems your system is mostly fine with just reclaim, and there's little need for order-2 compaction, and that's also why you can't reproduce the OOMs. So I'm afraid we'll learn nothing here, and looks like Hugh will have to try those watermark check adjustments/removals and/or provide the same kind of trace.