Re: [PATCH v11 0/7] Implement IOCTL to get and optionally clear info about PTEs

From: Peter Xu
Date: Tue Mar 21 2023 - 11:11:02 EST


On Tue, Mar 21, 2023 at 02:41:53PM +0200, Mike Rapoport wrote:
> On Mon, Mar 20, 2023 at 11:30:00AM -0700, Andrei Vagin wrote:
> > On Thu, Mar 9, 2023 at 11:58 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, 9 Mar 2023 18:57:11 +0500 Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx> wrote:
> > >
> > > > The information related to pages if the page is file mapped, present and
> > > > swapped is required for the CRIU project [5][6]. The addition of the
> > > > required mask, any mask, excluded mask and return masks are also required
> > > > for the CRIU project [5].
> > >
> > > It's a ton of new code and what I'm not seeing in here (might have
> > > missed it?) is a clear statement of the value of this feature to our
> > > users.
> > >
> > > I see hints that CRIU would like it, but no description of how valuable
> > > this is to CRIU's users.
> >
> > Hi Andrew,
> >
> > The current interface works for CRIU, and I can't say we have anything
> > critical with it right now.
> >
> > On the other hand, the new interface has a number of significant improvements:
> >
> > * it is more granular and allows us to track changed pages more
> > effectively. The current interface can clear dirty bits for the entire
> > process only. In addition, reading info about pages is a separate
> > operation. It means we must freeze the process to read information
> > about all its pages, reset dirty bits, only then we can start dumping
> > pages. The information about pages becomes more and more outdated,
> > while we are processing pages. The new interface solves both these
> > downsides. First, it allows us to read pte bits and clear the
> > soft-dirty bit atomically. It means that CRIU will not need to freeze
> > processes to pre-dump their memory. Second, it clears soft-dirty bits
> > for a specified region of memory. It means CRIU will have actual info
> > about pages to the moment of dumping them.
> >
> > * The new interface has to be much faster because basic page filtering
> > is happening in the kernel. With the old interface, we have to read
> > pagemap for each page.
>
> There is still a caveat in using userfaultfd for tracking dirty pages in
> CRIU because we still don't support C/R of processes that use uffd.

This reminded me whether the interface can also expose soft-dirty as a
ranged soft-dirty collector too to replace existing pagemap read()s? Just
in case userfault cannot be used. The code addition should be trivial IIUC.

Then maybe PAGE_IS_WRITTEN will be a name too generic, it can be two bits
PAGE_IS_UFFD_WP and PAGE_IS_SOFT_DIRTY, having PAGE_IS_UFFD_WP the inverted
meaning of current PAGE_IS_WRITTEN.

--
Peter Xu