Re: [PATCH v2] mm/page_alloc: count CMA pages per zone and print them in /proc/zoneinfo

From: David Hildenbrand
Date: Thu Jan 28 2021 - 17:32:20 EST



> Am 28.01.2021 um 23:28 schrieb David Rientjes <rientjes@xxxxxxxxxx>:
>
> On Thu, 28 Jan 2021, David Hildenbrand wrote:
>
>>> On Thu, 28 Jan 2021, David Hildenbrand wrote:
>>>
>>>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>>>> index 7758486097f9..957680db41fa 100644
>>>> --- a/mm/vmstat.c
>>>> +++ b/mm/vmstat.c
>>>> @@ -1650,6 +1650,11 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>>> zone->spanned_pages,
>>>> zone->present_pages,
>>>> zone_managed_pages(zone));
>>>> +#ifdef CONFIG_CMA
>>>> + seq_printf(m,
>>>> + "\n cma %lu",
>>>> + zone->cma_pages);
>>>> +#endif
>>>>
>>>> seq_printf(m,
>>>> "\n protection: (%ld",
>>>
>>> Hmm, not sure about this. If cma is only printed for CONFIG_CMA, we can't
>>> distinguish between (1) a kernel without your patch without including some
>>> version checking and (2) a kernel without CONFIG_CMA enabled. IOW,
>>> "cma 0" carries value: we know immediately that we do not have any CMA
>>> pages on this zone, period.
>>>
>>> /proc/zoneinfo is also not known for its conciseness so I think printing
>>> "cma 0" even for !CONFIG_CMA is helpful :)
>>>
>>> I think this #ifdef should be removed and it should call into a
>>> zone_cma_pages(struct zone *zone) which returns 0UL if disabled.
>>>
>>
>> Yeah, that’s also what I proposed in a sub-thread here.
>>
>
> Ah, I certainly think your original intuition was correct.
>
>> The last option would be going the full mile and not printing nr_free_cma. Code might get a bit uglier though, but we could also remove that stats counter ;)
>>
>> I don‘t particularly care, while printing „0“ might be easier, removing nr_free_cma might be cleaner.
>>
>> But then, maybe there are tools that expect that value to be around on any kernel?
>>
>
> Yeah, that's probably undue risk, the ship has sailed and there's no
> significant upside.
>
> I still think "cma 0" in /proc/zoneinfo carries value, though, especially
> for NUMA and it looks like this is how it's done in linux-next. With a
> single read of the file, userspace can make the determination what CMA
> pages exist on this node.
>
> In general, I think the rule-of-thumb is that the fewer ifdefs in
> /proc/zoneinfo, the easier it is for userspace to parse it.

Makes sense, I‘ll send an updated version tomorrow - thanks!


>
> (I made that change to /proc/zoneinfo to even print non-existant zones for
> each node because otherwise you cannot determine what the indices of
> things like vm.lowmem_reserve_ratio represent.)