Re: [PATCH v2 3/3] khugepaged: Reduce race probability between migration and khugepaged

From: Dev Jain
Date: Thu Jun 26 2025 - 01:00:16 EST



On 26/06/25 10:27 am, Lorenzo Stoakes wrote:
On Thu, Jun 26, 2025 at 09:22:28AM +0530, Dev Jain wrote:
On 25/06/25 6:58 pm, Lorenzo Stoakes wrote:
On Wed, Jun 25, 2025 at 11:28:06AM +0530, Dev Jain wrote:
Suppose a folio is under migration, and khugepaged is also trying to
collapse it. collapse_pte_mapped_thp() will retrieve the folio from the
page cache via filemap_lock_folio(), thus taking a reference on the folio
and sleeping on the folio lock, since the lock is held by the migration
path. Migration will then fail in
__folio_migrate_mapping -> folio_ref_freeze. Reduce the probability of
such a race happening (leading to migration failure) by bailing out
if we detect a PMD is marked with a migration entry.

This fixes the migration-shared-anon-thp testcase failure on Apple M3.
Hm is this related to the series at all? Seems somewhat unrelated?
Not related.

Is there a Fixes, Closes, etc.? Do we need something in stable?
We don't need anything. This is an "expected race" in the sense that
both migration and khugepaged collapse are best effort algorithms.
I am just seeing a test failure on my system because my system hits
the race more often. So this patch reduces the window for the race.
Does it rely on previous patches? If not probably best to send this one
separately :)

To prevent rebasing headaches for others (if any) I thought to send all together.
I'll send it separately if still that is the preference.