Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present

From: Linus Torvalds
Date: Wed Oct 09 2019 - 22:09:48 EST


On Wed, Oct 9, 2019 at 6:10 PM Thomas HellstrÃm (VMware)
<thomas_os@xxxxxxxxxxxx> wrote:
>
> Your original patch does exactly the same!

Oh, no. You misread my original patch.

Look again.

The logic in my original patch was very different. It said that

- *if* we have a pmd_entry function, then we obviously call that one.

And if - after calling the pmd_entry function - we are still a
hugepage, then we skip the pte_entry case entirely.

And part of skipping is obviously "don't split" - but it never had
that "don't split and then call the pte walker" case.

- and if we *don't* have a pmd_entry function, but we do have a
pte_entry function, then we always split before calling it.

Notice the difference?

So instead of looking at the return value of the pmd_entry() function,
the approach of that suggested patch was to basically say that if the
pmd_entry function wants us to go another level deeper and it was a
hugepmd, it needed to split the pmd to make that happen.

That's actually very different from what your patch did. My original
patch never tried to walk the pte level without having one - either it
*checked* that it had one, or it split.

But I see where you might have misread the patch, particularly if you
only looked at it as a patch, not as the end result of the patch.

The end result of that patch was this:

if (ops->pmd_entry) {
err = ops->pmd_entry(pmd, addr, next, walk);
if (err)
break;
/* No pte level walking? */
if (!ops->pte_entry)
continue;
/* No pte level at all? */
if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)
|| pmd_devmap(*pmd))
continue;
} else {
if (!ops->pte_entry)
continue;

split_huge_pmd(walk->vma, pmd, addr);
if (pmd_trans_unstable(pmd))
goto again;
}
err = walk_pte_range(pmd, addr, next, walk);

and look at thew two different sides of the if-statement: if they get
to "walk_pte_range()", both cases wil have verified that there
actually _is_ a pte level. They will just have done it differently. -
the "we didn't have a pmd function" will have split the pmd if it was
a hugepmd, while the "we do have a pmd_entry" case will just check
whether it's still a huge-pmd, and done a "continue" if it was and
never even tried to walk the ptes.

But I think the "change pmd_entry to have a sane return code" is a
simpler and more flexible model, and then the pmd_entry code can just
let the pte walker split the pmd if needed.

So I liked that part of your patch.

Linus