On Wed, Jun 25, 2025 at 10:24:53AM +0200, David Hildenbrand wrote:
On 25.06.25 10:12, Lorenzo Stoakes wrote:
On Wed, Jun 25, 2025 at 08:55:28AM +0100, Lorenzo Stoakes wrote:
I suppose the least awful way of addressing Baolin's concerns re: mTHP
while simultaneosly keeping existing semantics is:
1. Introduce deny to mean what never should have meant.
To fix Baolin's issue btw we'd have to add 'deny' to both 'global' settings
_and_ each page size setting.
Because otherwise we'd end up in a weird case where say:
global 'deny'
2 MiB 'never'
64 KiB 'inherit'
And err... get 2 MiB THP pages from MADV_COLLAPSE :)
Or:
global 'deny'
2 MiB 'never'
64 KiB 'always'
Or:
global 'never'
2 MiB 'never'
64 KiB 'always'
Or:
global 'never'
2 MiB 'madvise'
64 KiB 'always'
All doing the same. Not very clear is it?
We have sowed the seeds of something terrible here, truly.
Fully agreed. "Deny" is nasty. Maybe if we really need a way to disable
"madv_collapse", it should be done differently, not using this toggle here.
Yeah maybe the best way is to just have another tunable for this?
/sys/kernel/mm/transparent_hugepage/disable_collapse perhaps?
What do you think Hugh, Baolin?
Regarding MADV_COLLAPSE, I strongly assume that we should not change it to
collapse smaller mTHPs as part of the khugepaged mTHP work. For now, it will
simply always collapse to PMD THPs.
Yeah thinking about it maybe this is the best way. And we can then update
the man page to make this ABUNDANTLY clear (am happy to do this).
This keeps things simple.
(One side note on PMD-sized MADV_COLLAPSE - this is basically completely
useless for 64 KB page size arm64 systems where PMD's are 512 MB :)
Thoughts Baolin?
Once we want to support other sizes, likely MADV_COLLAPSE users want to have
better control over which size to use, at which point it all gets nasty.
madvise2() this time with extra parameters? ;)
I sort of wish we had added a flags parameter there.
But lacking a time machine... :)