Re: [PATCH v4] mm: page_alloc: dump migrate-failed pages

From: David Hildenbrand
Date: Thu Mar 11 2021 - 13:55:17 EST


On 11.03.21 19:30, Minchan Kim wrote:
Currently, debugging CMA allocation failures is quite limited.
The most commong source of these failures seems to be page

s/commong/common/

migration which doesn't provide any useful information on the
reason of the failure by itself. alloc_contig_range can report
those failures as it holds a list of migrate-failed pages.

page refcount, mapcount with page flags on dump_page are
helpful information to deduce the culprit. Furthermore,
dump_page_owner was super helpful to find long term pinner
who initiated the page allocation.

Maybe simply "The information logged by dump_page() has already proven helpful for debugging allocation issues, like identifying long-term pinnings on ZONE_MOVABLE or MIGRATE_CMA."


The reason it approach with dynamic debug is the debug message
could emit lots of noises as alloc_contig_range calls more
frequently since it's a best effort allocator.

"Let's use the dynamic debugging infrastructure, such that we avoid
flooding the logs and creating a lot of noise on frequent alloc_contig_range() calls. This information is helpful for debugging only."

>>>


There are two ifdefery conditions to support common dyndbg options:

- CONFIG_DYNAMIC_DEBUG_CORE && DYNAMIC_DEBUG_MODULE
It aims for supporting the feature with only specific file
with adding ccflags.

- CONFIG_DYNAMIC_DEBUG
It aims for supporting the feature with system wide globally.

A simple example to enable the feature:

Admin could enable the dump like this(by default, disabled)

echo "func alloc_contig_dump_pages +p" > control

Admin could disable it.

echo "func alloc_contig_dump_pages =_" > control

Detail goes Documentation/admin-guide/dynamic-debug-howto.rst

<<< I'd drop that completely and only mention:

"For details on dynamic debugging, see Documentation/admin-guide/dynamic-debug-howto.rst."

As you have usage in the code itself, I think you don't have to be repetitive. The ifdeffery seems to be common (e.g., include/linux/netdevice.) for dynamic debugging users, so I don't see the need to describe that in detail.

>>>


A concern is utility functions in dump_page uses inconsistent
loglevels.

__dump_page: KERN_WARNING
__dump_page_owner: KERN_ALERT
stack_trace_print: KERN_DEFAULT

There are bunch of places to use the inconsistent loglevel
utility functions(e.g., just grep dump_page/strace_trace_print).
It's unfortunate but here we are. It could be addressed
different patchset.

<<< I'd drop that completely and mention

"In the future, we might want to make the loglevels used inside dump_page() consistent and eventually rework the way we log the information here. See [1]"

Where [1] is a link to the discussion.


Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
---
* from v3 - https://lore.kernel.org/linux-mm/20210310180104.517886-1-minchan@xxxxxxxxxx
* add dyndgb usage comment - akpm
* use dumpstack instead of warn_on - david

* from v2 - https://lore.kernel.org/linux-mm/20210308202047.1903802-1-minchan@xxxxxxxxxx/
* remove ratelimit - mhocko

* from v1 - https://lore.kernel.org/linux-mm/20210217163603.429062-1-minchan@xxxxxxxxxx/
* use dynamic debugging with system wide instead of per-call site - mhocko

mm/page_alloc.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)


Minor nits:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..76fc202cb105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8453,6 +8453,36 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
pageblock_nr_pages));
}
+#if defined(CONFIG_DYNAMIC_DEBUG) || \
+ (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
+/*
+ * usage)

"usage)" looks wrong here. Did you mean "Usage:"

+ * dyndbg_dir="/sys/kernel/debug/dynamic_debug"
+ * To enable:
+ * echo "func alloc_contig_dump_pages +p" > $dyndbg_dir/control
+ * To disable:
+ * echo "func alloc_contig_dump_pages =_" > $dyndbg_dir/control
+ * For detail, read dynamic-debug-howto.rst

Maybe simply

"See admin-guide/dynamic-debug-howto.rst"

+ */
+static void alloc_contig_dump_pages(struct list_head *page_list)
+{
+ DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
+ "migrate failure");

You can fit that into a single line.

+
+ if (DYNAMIC_DEBUG_BRANCH(descriptor)) {
+ struct page *page;
+
+ dump_stack();
+ list_for_each_entry(page, page_list, lru)
+ dump_page(page, "migration failure");
+ }
+}
+#else
+static inline void alloc_contig_dump_pages(struct list_head *page_list)
+{
+}
+#endif
+
/* [start, end) must belong to a single zone. */
static int __alloc_contig_migrate_range(struct compact_control *cc,
unsigned long start, unsigned long end)
@@ -8496,6 +8526,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
}
if (ret < 0) {
+ alloc_contig_dump_pages(&cc->migratepages);
putback_movable_pages(&cc->migratepages);
return ret;
}


As I said, for my taste good enough for now. I would certainly preferred what Michal suggested (e.g., doing it via debug loglevels), but this gets the job done and is not too ugly.

Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>

--
Thanks,

David / dhildenb