Re: [PATCH] mm: be more verbose for alloc_contig_range faliures

From: David Hildenbrand
Date: Mon Mar 08 2021 - 08:23:02 EST


On 08.03.21 13:49, Michal Hocko wrote:
On Thu 04-03-21 10:22:51, Minchan Kim wrote:
[...]
How about this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 238d0fc232aa..489e557b9390 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8481,7 +8481,8 @@ static inline void dump_migrate_failure_pages(struct list_head *page_list)

/* [start, end) must belong to a single zone. */
static int __alloc_contig_migrate_range(struct compact_control *cc,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ bool nofail)

This sounds like a very bad idea to me. Your nofail definition might
differ from what we actually define as __GFP_NOFAIL but I do not think
this interface should ever promise anything that strong.
Sure movable, cma regions should effectively never fail but there will
never be any _guarantee_ for that.

While there are no guarantees, we want to make such allocations as likely as possible to succeed. Not succeeding should be the corner case and is worth investigating.


Earlier in the discussion I have suggested dynamic debugging facility.
Documentation/admin-guide/dynamic-debug-howto.rst. Have you tried to
look into that direction?

Did you see the previous mail this is based on:

https://lkml.kernel.org/r/YEEUq8ZRn4WyYWVx@xxxxxxxxxx

I agree that "nofail" is misleading. Rather something like "dump_on_failure", just a better name :)

--
Thanks,

David / dhildenb