Re: Regression: 4.5-rc1 (bisect: hugetlb: make mm and fs code explicitly non-modular vs CONFIG_TIMER_STATS)

From: Mike Kravetz
Date: Thu Jan 28 2016 - 19:28:05 EST


On 01/28/2016 02:59 PM, Paul Gortmaker wrote:
> [Re: Regression: 4.5-rc1 (bisect: hugetlb: make mm and fs code explicitly non-modular vs CONFIG_TIMER_STATS)] On 28/01/2016 (Thu 14:18) Mike Kravetz wrote:
>
>> On 01/28/2016 07:05 AM, Mike Kravetz wrote:
>>> On 01/28/2016 06:37 AM, Paul Gortmaker wrote:
>>>> [Re: Regression: 4.5-rc1 (bisect: hugetlb: make mm and fs code explicitly non-modular vs CONFIG_TIMER_STATS)] On 28/01/2016 (Thu 10:48) Christian Borntraeger wrote:
>>>>
>>>>> On 01/28/2016 10:40 AM, Hillf Danton wrote:
>>>>>>>
>>>>>>> Paul,
>>>>>>>
>>>>>>> the commit 3e89e1c5ea842 ("hugetlb: make mm and fs code explicitly non-modular")
>>>>>>> triggers belows warning/oops, if CONFIG_TIMER_STATS is set.
>>>>>>>
>>>>>>> Looking at the patch the only "real" change is the init_call,
>>>>>>> and indeed
>>>>>>> --- a/mm/hugetlb.c
>>>>>>> +++ b/mm/hugetlb.c
>>>>>>> @@ -2653,7 +2653,7 @@ static int __init hugetlb_init(void)
>>>>>>> mutex_init(&hugetlb_fault_mutex_table[i]);
>>>>>>> return 0;
>>>>>>> }
>>>>>>> -subsys_initcall(hugetlb_init);
>>>>>>> +device_initcall(hugetlb_init);
>>>>>>>
>>>>>>> /* Should be called on processing a hugepagesz=... option */
>>>>>>> void __init hugetlb_add_hstate(unsigned int order)
>>>>>>>
>>>>>>> makes the problem go away.
>>>>>>
>>>>>> Helps more if a patch is delivered.
>>>>>
>>>>> The problem is that the original change was intentional. So I do not not
>>>>> what the right fix is.
>>>>
>>>> Thanks for the report ; let me see if I can work out what TIMER_STATS
>>>> is doing to cause this sometime today.
>>>>
>>>
>>> Hmmm? CONFIG_TIMER_STATS is set in my config and I am not seeing the
>>> issue. Not sure, but it looks like Christian is building/running on
>>> s390. This 'might' be a contributing factor.
>>
>> I do not see how CONFIG_TIMER_STATS contributes to this issue. However,
>
> I looked at all the TIMER_STATS ifdef blocks and was also thinking the
> same thing. If it did toggle the problem then it was a red herring.
> My test config had this set and I retested x86-64 today with it set.
>
>> on s390 numa nodes are initialized at device_initcall in the appropriately
>> named routine numa_init_late(). hugetlb_init must be done after numa
>> initialization. So, I suggest we just move the hugetlb initialization
>> back to device_initcall. What do you think Paul? Patch below.
>
> We could, but that ignores the fact that the original priorities worked
> by chance and not by design, as my commit log indicates. Instead, I'd
> like to know why S390 does core NUMA operations as late as
> device_initcall. Setting up NUMA nodes should be arch_initcall or
> subsys_initcall, or earlier --- it should not be device_initcall as if
> it was some leaf node UART driver or ethernet driver. There is no
> endpoint "device" in NUMA in this context.

This is in linux-next after 4.5-rc1

commit 2d0f76a6ca1f2cdcffca7ce130f67ec61caa0999
Author: Michael Holzheu <holzheu@xxxxxxxxxxxxxxxxxx>
Date: Wed Jan 20 19:22:16 2016 +0100

s390/numa: move numa_init_late() from device to arch_initcall

Commit 3e89e1c5ea ("hugetlb: make mm and fs code explicitly
non-modular")
moves hugetlb_init() from module_init to subsys_initcall.

The hugetlb_init()->hugetlb_register_node() code accesses
"node->dev.kobj"
which is initialized in numa_init_late().

Since numa_init_late() is a device_initcall which is called *after*
subsys_initcall the above mentioned patch breaks NUMA on s390.

So fix this and move numa_init_late() to arch_initcall.

Fixes: 3e89e1c5ea ("hugetlb: make mm and fs code explicitly
non-modular")
Reviewed-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
Signed-off-by: Michael Holzheu <holzheu@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>

--
Mike Kravetz