Re: [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table

From: Andy Lutomirski
Date: Sat May 21 2022 - 18:19:52 EST


On 5/21/22 13:12, Matthew Wilcox wrote:
On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote:
I'm missing the most important point: why do we care and why should we
care to make our COW/fork implementation even more complicated?

Yes, we might save some page tables and we might reduce the fork() time,
however, which specific workload really benefits from this and why do we
really care about that workload? Without even hearing about an example
user in this cover letter (unless I missed it), I naturally wonder about
relevance in practice.

As I get older (and crankier), I get less convinced that fork() is
really the right solution for implementing system(). I feel that a
better model is to create a process with zero threads, but have an fd
to it. Then manipulate the child process through its fd (eg mmap
ld.so, open new fds in that process's fdtable, etc). Closing the fd
launches a new thread in the process (ensuring nobody has an fd to a
running process, particularly one which is setuid).

Heh, I learned serious programming on Windows, and I thought fork() was entertaining, cool, and a bad idea when I first learned about it. (I admit I did think the fact that POSIX fork and exec had many fewer arguments than CreateProcess was a good thing.) Don't even get me started on setuid -- if I had my way, distros would set NO_NEW_PRIVS on boot for the entire system.

I can see a rather different use for this type of shared-pagetable technology, though: monstrous MAP_SHARED mappings. For database and some VM users, multiple processes will map the same file. If there was a way to ensure appropriate alignment (or at least encourage it) and a way to handle mappings that don't cover the whole file, then having multiple mappings share the same page tables could be a decent efficiently gain. This doesn't even need COW -- it's "just" pagetable sharing.

It's probably a pipe dream, but I like to imagine that the bookkeeping that would enable this would also enable a much less ad-hoc concept of who owns which pagetable page. Then things like x86's KPTI LDT mappings would be less disgusting under the hood.

Android would probably like a similar feature for MAP_ANONYMOUS or that could otherwise enable Zygote to share paging structures (ideally without fork(), although that's my dream, not necessarily Android's). This is more complex, since COW is involved. Also possibly less valuable -- possibly the entire benefit and then some would be achieved by using huge pages for Zygote and arranging for CoWing one normal-size page out of a hugepage COW mapping to only COW the one page.

--Andy