Re: [PATCH] mm: support large mapping building for tmpfs

From: David Hildenbrand
Date: Wed Jul 02 2025 - 04:45:31 EST

Next message: Bartosz Golaszewski: "[PATCH RFC 0/5] pinctrl: introduce the concept of a GPIO pin function category"
Previous message: Huang, Kai: "Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum"
In reply to: Baolin Wang: "Re: [PATCH] mm: support large mapping building for tmpfs"
Next in thread: Baolin Wang: "Re: [PATCH] mm: support large mapping building for tmpfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hm, are we sure about that?

IMO, referring to the definition of RSS:
"resident set size (RSS) is the portion of memory (measured in
kilobytes) occupied by a process that is held in main memory (RAM). "

Seems we should report the whole large folio already in file to users.
Moreover, the tmpfs mount already adds the 'huge=always (or within)'
option to allocate large folios, so the increase in RSS seems also expected?

Well, traditionally we only account what is actually mapped. If you
MADV_DONTNEED part of the large folio, or only mmap() parts of it,
the RSS would never cover the whole folio -- only what is mapped.

I discuss part of that in:

commit 749492229e3bd6222dda7267b8244135229d1fd8
Author: David Hildenbrand <david@xxxxxxxxxx>
Date: Mon Mar 3 17:30:13 2025 +0100

mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT)

And how my changes there affect some system stats (e.g., "AnonPages", "Mapped").
But the RSS stays unchanged and corresponds to what is actually mapped into
the process.
Doing something similar for the RSS would be extremely hard (single page mapped into process
-> account whole folio to RSS), because it's per-folio-per-process information, not
per-folio information.

So by mapping more in a single page fault, you end up increasing "RSS". But I wouldn't
call that "expected". I rather suspect that nobody will really care :)

Also, how does fault_around_bytes interact

here?

The ‘fault_around’ is a bit tricky. Currently, 'fault_around' only
applies to read faults (via do_read_fault()) and does not control write
shared faults (via do_shared_fault()). Additionally, in the
do_shared_fault() function, PMD-sized large folios are also not
controlled by 'fault_around', so I just follow the handling of PMD-sized
large folios.

In order to support large mappings for tmpfs, besides checking VMA
limits and
PMD pagetable limits, it is also necessary to check if the linear page
offset
of the VMA is order-aligned within the file.

Why?

This only applies to PMD mappings. See below.

I previously had the same question, but I saw the comments for
‘thp_vma_suitable_order’ function, so I added the check here. If it's
not necessary to check non-PMD-sized large folios, should we update the
comments for 'thp_vma_suitable_order'?

I was not quite clear about PMD vs. !PMD.

The thing is, when you *allocate* a new folio, it must adhere at least to
pagecache alignment (e.g., cannot place an order-2 folio at pgoff 1) -- that is what
thp_vma_suitable_order() checks. Otherwise you cannot add it to the pagecache.

But once you *obtain* a folio from the pagecache and are supposed to map it
into the page tables, that must already hold true.

So you should be able to just blindly map whatever is given to you here
AFAIKS.

If you would get a pagecache folio that violates the linear page offset requirement
at that point, something else would have messed up the pagecache.

Or am I missing something?

--
Cheers,

David / dhildenb

Next message: Bartosz Golaszewski: "[PATCH RFC 0/5] pinctrl: introduce the concept of a GPIO pin function category"
Previous message: Huang, Kai: "Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum"
In reply to: Baolin Wang: "Re: [PATCH] mm: support large mapping building for tmpfs"
Next in thread: Baolin Wang: "Re: [PATCH] mm: support large mapping building for tmpfs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]