Re: CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is broken, was Re: [RFC PATCH 0/6] Deep talk about folio vmap

From: Huan Yang
Date: Sun Apr 06 2025 - 22:01:14 EST



在 2025/4/4 18:07, Muchun Song 写道:

On Apr 4, 2025, at 17:38, Muchun Song <muchun.song@xxxxxxxxx> wrote:



On Apr 4, 2025, at 17:01, Christoph Hellwig <hch@xxxxxx> wrote:

After the btrfs compressed bio discussion I think the hugetlb changes that
skip the tail pages are fundamentally unsafe in the current kernel.

That is because the bio_vec representation assumes tail pages do exist, so
as soon as you are doing direct I/O that generates a bvec starting beyond
the present head page things will blow up. Other users of bio_vecs might
do the same, but the way the block bio_vecs are generated are very suspect
to that. So we'll first need to sort that out and a few other things
before we can even think of enabling such a feature.

I would like to express my gratitude to Christoph for including me in the
thread. I have carefully read the cover letter in [1], which indicates
that an issue has arisen due to the improper use of `vmap_pfn()`. I'm
wondering if we could consider using `vmap()` instead. In the HVO scenario,
the tail struct pages do **exist**, but they are read-only. I've examined
the code of `vmap()`, and it appears that it only reads the struct page.
Therefore, it seems feasible for us to use `vmap()` (I am not a expert in
udmabuf.). Right?
I believe my stance is correct. I've also reviewed another thread in [2].
Allow me to clarify and correct the viewpoints you presented. You stated:
"
So by HVO, it also not backed by pages, only contains folio head, each
tail pfn's page struct go away.
"
This statement is entirely inaccurate. The tail pages do not cease to exist;
rather, they are read-only. For your specific use-case, please use `vmap()`
to resolve the issue at hand. If you wish to gain a comprehensive understanding

I see the document give a simple graph to point:

 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
 |           |                                     |     0     | -------------> |     0     |
 |           | +-----------+                +-----------+
 |           |                                      |     1     | -------------> |     1     |
 |           | +-----------+                +-----------+
 |           |                                      |     2     | ----------------^ ^ ^ ^ ^ ^
 |           | +-----------+                      | | | | |
 |           |                                      |     3     | ------------------+ | | | |
 |           | +-----------+                        | | | |
 |           |                                      |     4     | --------------------+ | | |
 |    PMD    | +-----------+                          | | |
 |   level   |                                   |     5     | ----------------------+ | |
 |  mapping  | +-----------+                             | |
 |           |                                     |     6     | ------------------------+ |
 |           | +-----------+                              |
 |           |                                     |     7     | --------------------------+
 |           |                                    +-----------+
 |           |
 |           |
 |           |
 +-----------+

If I understand correct, each 2-7 tail's page struct is freed, so if I just need map page 2-7, can we use vmap do

something correctly?

Or something I still misunderstand, please correct me.

Thanks,

Huan Yang

of the fundamentals of HVO, I kindly suggest a thorough review of the document
in [3].

[2] https://lore.kernel.org/lkml/5229b24f-1984-4225-ae03-8b952de56e3b@xxxxxxxx/#t
[3] Documentation/mm/vmemmap_dedup.rst

[1] https://lore.kernel.org/linux-mm/20250327092922.536-1-link@xxxxxxxx/T/#m055b34978cf882fd44d2d08d929b50292d8502b4

Thanks,
Muchun.