RE: [PATCH] mm/hugetlb: avoid hardcoding while checking if cma is reserved

From: Song Bao Hua (Barry Song)
Date: Mon Jul 06 2020 - 18:30:54 EST




> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, July 7, 2020 10:12 AM
> To: 'Roman Gushchin' <guro@xxxxxx>
> Cc: akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; Mike
> Kravetz <mike.kravetz@xxxxxxxxxx>; Jonathan Cameron
> <jonathan.cameron@xxxxxxxxxx>
> Subject: RE: [PATCH] mm/hugetlb: avoid hardcoding while checking if cma is
> reserved
>
>
>
> > -----Original Message-----
> > From: Roman Gushchin [mailto:guro@xxxxxx]
> > Sent: Tuesday, July 7, 2020 9:48 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> > Cc: akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> > linux-kernel@xxxxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; Mike
> > Kravetz <mike.kravetz@xxxxxxxxxx>; Jonathan Cameron
> > <jonathan.cameron@xxxxxxxxxx>
> > Subject: Re: [PATCH] mm/hugetlb: avoid hardcoding while checking if
> > cma is reserved
> >
> > On Mon, Jul 06, 2020 at 08:44:05PM +1200, Barry Song wrote:
> >
> > Hello, Barry!
> >
> > > hugetlb_cma[0] can be NULL due to various reasons, for example,
> > > node0 has no memory. Thus, NULL hugetlb_cma[0] doesn't necessarily
> > > mean cma is not enabled. gigantic pages might have been reserved on
> other nodes.
> >
> > Just curious, is it a real-life problem you've seen? If so, I wonder
> > how you're using the hugetlb_cma option, and what's the outcome?
>
> Yes. It is kind of stupid but I once got a board on which node0 has no DDR
> though node1 and node3 have memory.
>
> I actually prefer we get cma size of per node by:
> cma size of one node = hugetlb_cma/ (nodes with memory) rather than:
> cma size of one node = hugetlb_cma/ (all online nodes)
>
> but unfortunately, or the N_MEMORY infrastructures are not ready yet. I
> mean:
>
> for_each_node_state(nid, N_MEMORY) {
> int res;
>
> size = min(per_node, hugetlb_cma_size - reserved);
> size = round_up(size, PAGE_SIZE << order);
>
> res = cma_declare_contiguous_nid(0, size, 0, PAGE_SIZE << order,
> 0, false, "hugetlb",
> &hugetlb_cma[nid], nid);
> ...
> }
>

And for a server, there are many memory slots. The best config would be
making every node have at least one DDR. But it isn't necessarily true, it
is totally up to the users.

If we move hugetlb_cma_reserve() a bit later, we probably make hugetlb_cma size
completely consistent by splitting it to nodes with memory rather than nodes
which are online:

void __init bootmem_init(void)
{
...

arm64_numa_init();

/*
* must be done after arm64_numa_init() which calls numa_init() to
* initialize node_online_map that gets used in hugetlb_cma_reserve()
* while allocating required CMA size across online nodes.
*/
- #ifdef CONFIG_ARM64_4K_PAGES
- hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
- #endif

...

sparse_init();
zone_sizes_init(min, max);

+ #ifdef CONFIG_ARM64_4K_PAGES
+ hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
+ #endif
memblock_dump_all();
}

For x86, it could be done in similar way. Do you think it is worth to try?

> >
> > >
> > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic
> > > hugepages
> > using cma")
> > > Cc: Roman Gushchin <guro@xxxxxx>
> > > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> > > Cc: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>
> > > Signed-off-by: Barry Song <song.bao.hua@xxxxxxxxxxxxx>
> > > ---
> > > mm/hugetlb.c | 18 +++++++++++++++---
> > > 1 file changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c index
> > > 57ece74e3aae..603aa854aa89 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -2571,9 +2571,21 @@ static void __init
> > hugetlb_hstate_alloc_pages(struct hstate *h)
> > >
> > > for (i = 0; i < h->max_huge_pages; ++i) {
> > > if (hstate_is_gigantic(h)) {
> > > - if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
> > > - pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip
> > boot time allocation\n");
> > > - break;
> > > + if (IS_ENABLED(CONFIG_CMA)) {
> > > + int nid;
> > > + bool cma_reserved = false;
> > > +
> > > + for_each_node_state(nid, N_ONLINE) {
> > > + if (hugetlb_cma[nid]) {
> > > + pr_warn_once("HugeTLB: hugetlb_cma is
> > reserved,"
> > > + "skip boot time allocation\n");
> > > + cma_reserved = true;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + if (cma_reserved)
> > > + break;
> >
> > It's a valid problem, and I like to see it fixed. But I wonder if it
> > would be better to introduce a new helper bool hugetlb_cma_enabled()?
> > And move both
> > IS_ENABLED(CONFIG_CMA)
> > and hugetlb_cma[nid] checks there?
>
> Yep. that would be more readable.
>
> >
> > Thank you!
>
> Thanks
> Barry

Thanks
Barry