Re: [PATCH] mm/hugetlb: avoid weird message in hugetlb_init

From: Mike Kravetz
Date: Fri Mar 06 2020 - 15:12:38 EST


On 3/5/20 10:36 PM, Longpeng (Mike) wrote:
> å 2020/3/6 8:09, Mike Kravetz åé:
>> On 3/4/20 7:30 PM, Longpeng(Mike) wrote:
>>> From: Longpeng <longpeng2@xxxxxxxxxx>
>
>> I am thinking we may want to have a more generic solution by allowing
>> the default_hugepagesz= processing code to verify the passed size and
>> set up the corresponding hstate. This would require more cooperation
>> between architecture specific and independent code. This could be
>> accomplished with a simple arch_hugetlb_valid_size() routine provided
>> by the architectures. Below is an untested patch to add such support
>> to the architecture independent code and x86. Other architectures would
>> be similar.
>>
>> In addition, with architectures providing arch_hugetlb_valid_size() it
>> should be possible to have a common routine in architecture independent
>> code to read/process hugepagesz= command line arguments.
>>
> I just want to use the minimize changes to address this issue, so I choosed a
> way which my patch did.
>
> To be honest, the approach you suggested above is much better though it need
> more changes.
>
>> Of course, another approach would be to simply require ALL architectures
>> to set up hstates for ALL supported huge page sizes.
>>
> I think this is also needed, then we can request all supported size of hugepages
> by sysfs(e.g. /sys/kernel/mm/hugepages/*) dynamically. Currently, (x86) we can
> only request 1G-hugepage through sysfs if we boot with 'default_hugepagesz=1G',
> even with the first approach.

IÂ'think' you can use sysfs for 1G huge pages on x86 today. Just booted a
system without any hugepage options on the command line.

# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
0
# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/^Cugepages
# echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
1
# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
1

x86 and riscv will set up hstates for PUD_SIZE hstates by default if
CONFIG_CONTIG_ALLOC. This is because of a somewhat recent feature that
allowed dynamic allocation of gigantic (page order >= MAX_ORDER) pages.
Before that feature, it made no sense to set up an hstate for gigantic
pages if they were not allocated at boot time and could not be dynamically
added later.

I'll code up a proposal that does the following:
- Have arch specific code provide a list of supported huge page sizes
- Arch independent code uses list to create all hstates
- Move processing of "hugepagesz=" to arch independent code
- Validate "default_hugepagesz=" when value is read from command line

It make take a few days. When ready, I will pull in the architecture
specific people.


> BTW, because it's not easy to discuss with you due to the time difference, I
> have another question about the default hugepages to consult you here. Why the
> /proc/meminfo only show the info about the default hugepages, but not others?
> meminfo is more well know than sysfs, some ordinary users know meminfo but don't
> know use the sysfs to get the hugepages status(e.g. total, free).

I believe that is simply history. In the beginning there was only the
default huge page size and that was added to meminfo. People then wrote
scripts to parse huge page information in meminfo. When support for
other huge pages was added, it was not added to meminfo as it could break
user scripts parsing the file. Adding information for all potential
huge page sizes may create lots of entries that are unused. I was not
around when these decisions were made, but that is my understanding.
BTW - A recently added meminfo field 'Hugetlb' displays the amount of
memory consumed by huge pages of ALL sizes.
--
Mike Kravetz