Re: Regression: 4.5-rc1 (bisect: hugetlb: make mm and fs code explicitly non-modular vs CONFIG_TIMER_STATS)

From: Christian Borntraeger
Date: Fri Jan 29 2016 - 03:23:42 EST


On 01/29/2016 01:27 AM, Mike Kravetz wrote:
> On 01/28/2016 02:59 PM, Paul Gortmaker wrote:
>> [Re: Regression: 4.5-rc1 (bisect: hugetlb: make mm and fs code explicitly non-modular vs CONFIG_TIMER_STATS)] On 28/01/2016 (Thu 14:18) Mike Kravetz wrote:
>>
>>> On 01/28/2016 07:05 AM, Mike Kravetz wrote:
>>>> On 01/28/2016 06:37 AM, Paul Gortmaker wrote:
>>>>> [Re: Regression: 4.5-rc1 (bisect: hugetlb: make mm and fs code explicitly non-modular vs CONFIG_TIMER_STATS)] On 28/01/2016 (Thu 10:48) Christian Borntraeger wrote:
>>>>>
>>>>>> On 01/28/2016 10:40 AM, Hillf Danton wrote:
>>>>>>>>
>>>>>>>> Paul,
>>>>>>>>
>>>>>>>> the commit 3e89e1c5ea842 ("hugetlb: make mm and fs code explicitly non-modular")
>>>>>>>> triggers belows warning/oops, if CONFIG_TIMER_STATS is set.
>>>>>>>>
>>>>>>>> Looking at the patch the only "real" change is the init_call,
>>>>>>>> and indeed
>>>>>>>> --- a/mm/hugetlb.c
>>>>>>>> +++ b/mm/hugetlb.c
>>>>>>>> @@ -2653,7 +2653,7 @@ static int __init hugetlb_init(void)
>>>>>>>> mutex_init(&hugetlb_fault_mutex_table[i]);
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>> -subsys_initcall(hugetlb_init);
>>>>>>>> +device_initcall(hugetlb_init);
>>>>>>>>
>>>>>>>> /* Should be called on processing a hugepagesz=... option */
>>>>>>>> void __init hugetlb_add_hstate(unsigned int order)
>>>>>>>>
>>>>>>>> makes the problem go away.
>>>>>>>
>>>>>>> Helps more if a patch is delivered.
>>>>>>
>>>>>> The problem is that the original change was intentional. So I do not not
>>>>>> what the right fix is.
>>>>>
>>>>> Thanks for the report ; let me see if I can work out what TIMER_STATS
>>>>> is doing to cause this sometime today.
>>>>>
>>>>
>>>> Hmmm? CONFIG_TIMER_STATS is set in my config and I am not seeing the
>>>> issue. Not sure, but it looks like Christian is building/running on
>>>> s390. This 'might' be a contributing factor.
>>>
>>> I do not see how CONFIG_TIMER_STATS contributes to this issue. However,
>>
>> I looked at all the TIMER_STATS ifdef blocks and was also thinking the
>> same thing. If it did toggle the problem then it was a red herring.
>> My test config had this set and I retested x86-64 today with it set.
>>
>>> on s390 numa nodes are initialized at device_initcall in the appropriately
>>> named routine numa_init_late(). hugetlb_init must be done after numa
>>> initialization. So, I suggest we just move the hugetlb initialization
>>> back to device_initcall. What do you think Paul? Patch below.
>>
>> We could, but that ignores the fact that the original priorities worked
>> by chance and not by design, as my commit log indicates. Instead, I'd
>> like to know why S390 does core NUMA operations as late as
>> device_initcall. Setting up NUMA nodes should be arch_initcall or
>> subsys_initcall, or earlier --- it should not be device_initcall as if
>> it was some leaf node UART driver or ethernet driver. There is no
>> endpoint "device" in NUMA in this context.
>
> This is in linux-next after 4.5-rc1
>
> commit 2d0f76a6ca1f2cdcffca7ce130f67ec61caa0999
> Author: Michael Holzheu <holzheu@xxxxxxxxxxxxxxxxxx>
> Date: Wed Jan 20 19:22:16 2016 +0100
>
> s390/numa: move numa_init_late() from device to arch_initcall
>
> Commit 3e89e1c5ea ("hugetlb: make mm and fs code explicitly
> non-modular")
> moves hugetlb_init() from module_init to subsys_initcall.
>
> The hugetlb_init()->hugetlb_register_node() code accesses
> "node->dev.kobj"
> which is initialized in numa_init_late().
>
> Since numa_init_late() is a device_initcall which is called *after*
> subsys_initcall the above mentioned patch breaks NUMA on s390.
>
> So fix this and move numa_init_late() to arch_initcall.
>
> Fixes: 3e89e1c5ea ("hugetlb: make mm and fs code explicitly
> non-modular")
> Reviewed-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
> Signed-off-by: Michael Holzheu <holzheu@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
>


Ah, ok. thanks. Yes that makes a lot of sense. Thanks for pointing me
to this patch.