OK overall the logic looks good now I realise the mistake I made is that
the thp_vma_allowable_orders() check is for the vma_is_anonymous() case
only.
On Mon, Jun 09, 2025 at 02:31:46PM +0800, Baolin Wang wrote:
On 2025/6/7 20:14, Lorenzo Stoakes wrote:
On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
will still attempt to collapse into a shmem THP. This violates the rule we have
agreed upon: never means never.
Ugh that we have separate shmem logic. And split between huge_memory.c and
shmem.c too :)) Again, not your fault, just a general moan about existing
stuff :P
Another rule for madvise, referring to David's suggestion: “allowing for collapsing
in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
Hm I'm not sure if this is enforced is it? I may have missed something here
however.
Then the current strategy is:
For shmem, if none of always, madvise, within_size, and inherit have enabled
PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
Again, is this just MADV_COLLAPSE? Surely this is a general change?
We should be clear that we are not explicitly limiting ourselves to
MADV_COLLAPSE here.
You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
TVA_ENFORCE_SYSFS and that's the key difference here.
Sure.
Thanks!
For tmpfs, if the mount option is set with the 'huge=never' parameter, then
MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
Acked-by: Zi Yan <ziy@xxxxxxxxxx>
Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
mm/huge_memory.c | 2 +-
mm/shmem.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d3e66136e41a..a8cfa37cae72 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
* own flags.
*/
if (!in_pf && shmem_file(vma->vm_file))
- return shmem_allowable_huge_orders(file_inode(vma->vm_file),
+ return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
vma, vma->vm_pgoff, 0,
!enforce_sysfs);
Did you mean to do &&?
No. It might be worth having a separate fix patch here, because the original
logic is incorrect and needs to perform an '&' operation with ’orders‘.
This change should be a general change.
Ah yeah, I did think that _perhaps_ it could be. I think it would make
sense to separate out into another patch albeit very small, just so we can
separate your further changes from the fix for this.
Also, is this achieving what you want to achieve? Is it necessary? The
changes in patch 1/2 enforce the global settings and before this code in
__thp_vma_allowable_orders():
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
... (no early exits) ...
orders &= supported_orders;
if (!orders)
return 0;
...
}
So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
is the only caller of __thp_vma_allowable_orders() then we _always_ exit
early here before we get to this shmem_allowable_huge_orders() code.
Not for this case. Since shmem already supports mTHP, this is to check
whether the 'orders' are enabled in the shmem mTHP configuration. For
example, shmem mTHP might only enable 64K mTHP, which obviously does not
allow PMD-sized THP to collapse.
Yeah sorry I get it now thp_vma_allowable_orders() does a
vma_is_anonymous() predicate. Doh! :P
God what a mess this is (not your fault, pre-existing obviously :P)
Yeah le sigh.
diff --git a/mm/shmem.c b/mm/shmem.c
index 4b42419ce6b2..9af45d4e27e6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
return 0;
if (shmem_huge == SHMEM_HUGE_DENY)
return 0;
- if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
+ if (shmem_huge == SHMEM_HUGE_FORCE)
return maybe_pmd_order;
OK I get it now, this means the !check sysfs doesn't just get
actioned... :)
/*
@@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
fallthrough;
case SHMEM_HUGE_ADVISE:
- if (vm_flags & VM_HUGEPAGE)
+ if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
return maybe_pmd_order;
fallthrough;
default:
@@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
/* Allow mTHP that will be fully within i_size. */
mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
- if (vm_flags & VM_HUGEPAGE)
+ if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
mask |= READ_ONCE(huge_shmem_orders_madvise);
I'm also not sure these are necessary:
The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
-> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
cover off this case by early exiting __thp_vma_allowable_orders() if orders
== 0 as established in patch 1/2.
Not ture. Shmem has its own separate mTHP sysfs setting, which is different
from the mTHP sysfs setting for anonymous pages mentioned earlier. These
checks are in the shmem file. You can check more for shmem mTHP in
Documentation/admin-guide/mm/transhuge.rst :)
Ah yeah the issue is if (vma_is_anonymous())... doh!
The stack trace is correct though, this is the only place we do it:
~~
static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
loff_t write_end, bool shmem_huge_force,
struct vm_area_struct *vma,
unsigned long vm_flags)
Is called from shmem_getattr():
if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
stat->blksize = HPAGE_PMD_SIZE;
Note that smem_huge_force == false here
And shmem_allowable_huge_orders():
unsigned long shmem_allowable_huge_orders(struct inode *inode,
struct vm_area_struct *vma, pgoff_t index,
loff_t write_end, bool shmem_huge_force)
global_orders = shmem_huge_global_enabled(inode, index, write_end,
shmem_huge_force, vma, vm_flags);
Which forwards 'shmem_huge_force'.
In shmem_get_folow_gfp():
orders = shmem_allowable_huge_orders(inode, vma, index, write_end, false);
Note that shmem_huge_force == false.
In __thp_vma_allowable_orders();
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
...
bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
...
if (!in_pf && shmem_file(vma->vm_file))
return shmem_allowable_huge_orders(file_inode(vma->vm_file),
vma, vma->vm_pgoff, 0,
!enforce_sysfs);
So we set shmem_huge_force only if TVA_ENFORCE_SYSFS is not set in tva_flags passed to __thp_vma_allowable_orders()
The only caller of __thp_vma_allowable_orders() is thp_vma_allowable_orders().
But yeah we do need to do shmem things... sorry my mistake.