Re: Commit 'hugetlbfs: extend the definition of hugepages parameter to support node allocation' breaks old numa less syntax of reserving hugepages on boot.

From: Mike Kravetz
Date: Sun Nov 28 2021 - 23:33:56 EST

Next message: Nishanth Menon: "Re: [PATCH 4/4] mtd: nand: omap2: Add support for NAND Controller on AM64 SoC"
Previous message: Guenter Roeck: "Re: Linux 5.16-rc3"
In reply to: Maxim Levitsky: "Commit 'hugetlbfs: extend the definition of hugepages parameter to support node allocation' breaks old numa less syntax of reserving hugepages on boot."
Next in thread: Zhenguo Yao: "Re: Commit 'hugetlbfs: extend the definition of hugepages parameter to support node allocation' breaks old numa less syntax of reserving hugepages on boot."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 11/28/21 03:18, Maxim Levitsky wrote:
>
> dmesg prints this:
>
> HugeTLB: allocating 64 of page size 1.00 GiB failed. Only allocated 0 hugepages
>
> Huge pages were allocated on kernel command line (1/2 of 128GB system):
>
> 'default_hugepagesz=1G hugepagesz=1G hugepages=64'
>
> This is 3970X and no real support/need for NUMA, thus only fake NUMA node 0 is present.
>
> Reverting the commit helps.
>
> New syntax also works ( hugepages=0:64 )
>
> I can test any patches for this bug.

Argh! I think preallocation of gigantic pages on all systems with only
a single node is broken. The issue is at the beginning of
__alloc_bootmem_huge_page:

int __alloc_bootmem_huge_page(struct hstate *h, int nid)
{
struct huge_bootmem_page *m = NULL; /* initialize for clang */
int nr_nodes, node;

if (nid >= nr_online_nodes)
return 0;

Without using the node specific syntax, nid == NUMA_NO_NODE == -1. For the
comparison, nid will be converted to an unsigned into to match nr_online_nodes
so we will immediately return 0 instead of doing the allocations.

Zhenguo Yao,
Can you verify and perhaps put together a patch?does

>
> Also unrelated, is there any progress on allocating 1GB pages on demand so that I could
> allocate them only when I run a VM?

That should be possible. Such support was added back in 2014 with commit
944d9fec8d7a "hugetlb: add support for gigantic page allocation at runtime".

>
> i don't mind having these pages to be marked as to be used for userspace only,
> since as far as I remember its the kernel usage that makes some page unmoveable.
>

Of course, finding 1GB of contiguous space for a gigantic page is often
difficult at runtime. So, allocations are likely to fail the longer the
system is up and running and fragmentation increases.

> Last time (many years ago) I tried to create a zone with only userspace pages
> (I don't remember what options I used) but it didn't work.

Not too long ago, support was added to use CMA for gigantic page allocation.
See commit cf11e85fc08c "mm: hugetlb: optionally allocate gigantic hugepages
using cma". This sounds like something you might want to try.
--
Mike Kravetz

>
> Is there a way to debug what is causing unmoveable pages and doesn't let
> /proc/sys/vm/nr_hugepages work (I tried it today and as usual the number
> it can allocate steadly decreases over time).

Next message: Nishanth Menon: "Re: [PATCH 4/4] mtd: nand: omap2: Add support for NAND Controller on AM64 SoC"
Previous message: Guenter Roeck: "Re: Linux 5.16-rc3"
In reply to: Maxim Levitsky: "Commit 'hugetlbfs: extend the definition of hugepages parameter to support node allocation' breaks old numa less syntax of reserving hugepages on boot."
Next in thread: Zhenguo Yao: "Re: Commit 'hugetlbfs: extend the definition of hugepages parameter to support node allocation' breaks old numa less syntax of reserving hugepages on boot."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]