Re: [PATCH v3 0/5] add persistent huge zero folio support

From: Kiryl Shutsemau
Date: Mon Aug 11 2025 - 05:43:41 EST


On Mon, Aug 11, 2025 at 10:41:08AM +0200, Pankaj Raghav (Samsung) wrote:
> From: Pankaj Raghav <p.raghav@xxxxxxxxxxx>
>
> Many places in the kernel need to zero out larger chunks, but the
> maximum segment we can zero out at a time by ZERO_PAGE is limited by
> PAGE_SIZE.
>
> This concern was raised during the review of adding Large Block Size support
> to XFS[2][3].
>
> This is especially annoying in block devices and filesystems where
> multiple ZERO_PAGEs are attached to the bio in different bvecs. With multipage
> bvec support in block layer, it is much more efficient to send out
> larger zero pages as a part of single bvec.
>
> Some examples of places in the kernel where this could be useful:
> - blkdev_issue_zero_pages()
> - iomap_dio_zero()
> - vmalloc.c:zero_iter()
> - rxperf_process_call()
> - fscrypt_zeroout_range_inline_crypt()
> - bch2_checksum_update()
> ...
>
> Usually huge_zero_folio is allocated on demand, and it will be
> deallocated by the shrinker if there are no users of it left. At the moment,
> huge_zero_folio infrastructure refcount is tied to the process lifetime
> that created it. This might not work for bio layer as the completions
> can be async and the process that created the huge_zero_folio might no
> longer be alive. And, one of the main point that came during discussion
> is to have something bigger than zero page as a drop-in replacement.
>
> Add a config option PERSISTENT_HUGE_ZERO_FOLIO that will always allocate
> the huge_zero_folio, and disable the shrinker so that huge_zero_folio is
> never freed.
> This makes using the huge_zero_folio without having to pass any mm struct and does
> not tie the lifetime of the zero folio to anything, making it a drop-in
> replacement for ZERO_PAGE.
>
> I have converted blkdev_issue_zero_pages() as an example as a part of
> this series. I also noticed close to 4% performance improvement just by
> replacing ZERO_PAGE with persistent huge_zero_folio.
>
> I will send patches to individual subsystems using the huge_zero_folio
> once this gets upstreamed.
>
> Looking forward to some feedback.

Why does it need to be compile-time? Maybe whoever needs huge zero page
would just call get_huge_zero_page()/folio() on initialization to get it
pinned?

--
Kiryl Shutsemau / Kirill A. Shutemov