Re: [PATCH] huge tmpfs: try to split_huge_page() when punching hole

From: Hugh Dickins
Date: Thu Feb 27 2020 - 23:04:42 EST


On Thu, 27 Feb 2020, Kirill A. Shutemov wrote:
> On Wed, Feb 26, 2020 at 08:06:33PM -0800, Hugh Dickins wrote:
> > Yang Shi writes:
> >
> > Currently, when truncating a shmem file, if the range is partly in a THP
> > (start or end is in the middle of THP), the pages actually will just get
> > cleared rather than being freed, unless the range covers the whole THP.
> > Even though all the subpages are truncated (randomly or sequentially),
> > the THP may still be kept in page cache.
> >
> > This might be fine for some usecases which prefer preserving THP, but
> > balloon inflation is handled in base page size. So when using shmem THP
> > as memory backend, QEMU inflation actually doesn't work as expected since
> > it doesn't free memory. But the inflation usecase really needs to get
> > the memory freed. (Anonymous THP will also not get freed right away,
> > but will be freed eventually when all subpages are unmapped: whereas
> > shmem THP still stays in page cache.)
> >
> > Split THP right away when doing partial hole punch, and if split fails
> > just clear the page so that read of the punched area will return zeroes.
> >
> > Hugh Dickins adds:
> >
> > Our earlier "team of pages" huge tmpfs implementation worked in the way
> > that Yang Shi proposes; and we have been using this patch to continue to
> > split the huge page when hole-punched or truncated, since converting over
> > to the compound page implementation. Although huge tmpfs gives out huge
> > pages when available, if the user specifically asks to truncate or punch
> > a hole (perhaps to free memory, perhaps to reduce the memcg charge), then
> > the filesystem should do so as best it can, splitting the huge page.
>
> I'm still uncomfortable with proposition to use truncate or punch a hole
> operations to manage memory footprint. These operations are about managing
> storage footprint, not memory. This happens to be the same for tmpfs.

I'd slightly reword that as "These operations are mainly about managing
storage footprint. This happens to be the same as memory for tmpfs."
and then happily agree with it.

>
> I wounder if we should consider limiting the behaviour to the operation
> that explicitly combines memory and storage managing: MADV_REMOVE.

I'd strongly oppose letting MADV_REMOVE diverge from FALLOC_FL_PUNCH_HOLE:
if it came down to that, I would prefer to revert this patch.

> This way we can avoid future misunderstandings with THP backed by a real
> filesystem.

It's good to consider the implications for hole-punch on a persistent
filesystem cached with THPs (or lower order compound pages); but I
disagree that they should behave differently from this patch.

The hole-punch is fundamentally directed at freeing up the storage, yes;
but its page cache must also be removed, otherwise you have the user
writing into cache which is not backed by storage, and potentially losing
the data later. So a hole must be punched in the compound page in that
case too: in fact, it's then much more important that split_huge_page()
succeeds - not obvious what the fallback should be if it fails (perhaps
in that case the compound page must be kept, but all its pmds removed,
and info on holes kept in spare fields of the compound page, to prevent
writes and write faults without calling back into the filesystem:
soluble, but more work than tmpfs needs today)(and perhaps when that
extra work is done, we would choose to rely on it rather than
immediately splitting; but it will involve discounting the holes).

Hugh