Re: [PATCH 0/6] mm: make movable onlining suck less

From: Michal Hocko
Date: Wed Apr 05 2017 - 09:54:04 EST


On Tue 04-04-17 16:43:39, Reza Arbab wrote:
> On Tue, Apr 04, 2017 at 09:41:22PM +0200, Michal Hocko wrote:
> >On Tue 04-04-17 13:30:13, Reza Arbab wrote:
> >>I think I found another edge case. You
> >>get an oops when removing all of a node's memory:
> >>
> >>__nr_to_section
> >>__pfn_to_section
> >>find_biggest_section_pfn
> >>shrink_pgdat_span
> >>__remove_zone
> >>__remove_section
> >>__remove_pages
> >>arch_remove_memory
> >>remove_memory
> >
> >Is this something new or an old issue? I believe the state after the
> >online should be the same as before. So if you onlined the full node
> >then there shouldn't be any difference. Let me have a look...
>
> It's new. Without this patchset, I can repeatedly
> add_memory()->online_movable->offline->remove_memory() all of a node's
> memory.

OK, I know what is going on here.
shrink_pgdat_span: start_pfn=0x1ff00, end_pfn=0x20000, pgdat_start_pfn=0x0, pgdat_end_pfn=0x20000
[...]
find_biggest_section_pfn loop: pfn=0xff, sec_nr = 0x0

so the node starts at pfn 0 while we are trying to remove range starting
from pfn=255 (1MB). Rather than going with find_smallest_section_pfn we
go with the other branch and that underflows as already mentioned. I
seriously doubt that the node really starts at pfn 0. I am not sure
which arch you are testing on but I believe we reserve the lowest
address pfn range on all aches. The previous code presumably handled
that properly because the original node/zone has started at the lowest
possible address and the zone shifting then preserves that.

My code doesn't do that though. So I guess I have to sanitize. Does this
help? Please drop the "mm, memory_hotplug: get rid of zone/node
shrinking" patch.
---
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index acf2b5eb5ecb..2c5613d19eb6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -750,6 +750,15 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
int ret;
struct memory_notify arg;

+ do {
+ if (pfn_valid(pfn))
+ break;
+ pfn++;
+ } while (--nr_pages > 0);
+
+ if (!nr_pages)
+ return -EINVAL;
+
nid = pfn_to_nid(pfn);
if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
return -EINVAL;
--
Michal Hocko
SUSE Labs