Re: [PATCH] mm, memory hotplug: print more failure information for online_pages

From: Chen Yucong
Date: Wed Feb 24 2016 - 20:07:35 EST


On Wed, 2016-02-24 at 13:33 -0800, David Rientjes wrote:
> On Wed, 24 Feb 2016, Chen Yucong wrote:
>
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index c832ef3..e4b6dec3 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1059,10 +1059,9 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
> >
> > ret = memory_notify(MEM_GOING_ONLINE, &arg);
> > ret = notifier_to_errno(ret);
> > - if (ret) {
> > - memory_notify(MEM_CANCEL_ONLINE, &arg);
> > - return ret;
> > - }
> > + if (ret)
> > + goto failed_addition;
> > +
> > /*
> > * If this zone is not populated, then it is not in zonelist.
> > * This means the page allocator ignores this zone.
> > @@ -1080,12 +1079,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
> > if (need_zonelists_rebuild)
> > zone_pcp_reset(zone);
> > mutex_unlock(&zonelists_mutex);
> > - printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n",
> > - (unsigned long long) pfn << PAGE_SHIFT,
> > - (((unsigned long long) pfn + nr_pages)
> > - << PAGE_SHIFT) - 1);
> > - memory_notify(MEM_CANCEL_ONLINE, &arg);
> > - return ret;
> > + goto failed_addition;
> > }
> >
> > zone->present_pages += onlined_pages;
> > @@ -1118,6 +1112,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
> > if (onlined_pages)
> > memory_notify(MEM_ONLINE, &arg);
> > return 0;
> > +
> > +failed_addition:
> > + pr_info("online_pages [mem %#010llx-%#010llx] failed\n",
> > + (unsigned long long) pfn << PAGE_SHIFT,
> > + (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
> > + memory_notify(MEM_CANCEL_ONLINE, &arg);
> > + return ret;
> > }
> > #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
> >
>
> Please explain how the conversion from KERN_DEBUG to KERN_INFO level is
> better?

Like __offline_pages(), printk() in online_pages() is used for reporting
an failed addition rather than debug information.
Another reason is that pr_debug() is not an exact equivalent of
printk(KERN_DEBUG ...)

/* If you are writing a driver, please use dev_dbg instead */
#if defined(CONFIG_DYNAMIC_DEBUG)
/* dynamic_pr_debug() uses pr_fmt() internally so we don't need it here
*/
#define pr_debug(fmt, ...) \
dynamic_pr_debug(fmt, ##__VA_ARGS__)
#elif defined(DEBUG)
#define pr_debug(fmt, ...) \
printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
#else
#define pr_debug(fmt, ...) \
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
#endif


> If the onlining returns an error value, which it will, why do we need to
> leave an artifact behind in the kernel log that it failed?

In __offline_pages(), we can find the following snippet:

...
ret = memory_notify(MEM_GOING_OFFLINE, &arg);
ret = notifier_to_errno(ret);
if (ret)
goto failed_removal;
...
offlined_pages = check_pages_isolated(start_pfn, end_pfn);
if (offlined_pages < 0) {
ret = -EBUSY;
goto failed_removal;
}
...
failed_removal:
printk(KERN_INFO "memory offlining [mem %#010llx-%#010llx]
...

Similarly, there's no single cause for failed online_pages operation.
So if memory_notify(MEM_GOING_ONLINE, &arg) returns an error
value, the result of online_pages is also ""online_pages [mem %#010llx-%
#010llx] failed\n".

thx!
cyc