Re: [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table

From: David Hildenbrand
Date: Sat May 21 2022 - 16:29:14 EST


On 21.05.22 20:50, Chih-En Lin wrote:
> On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote:
>> On 19.05.22 20:31, Chih-En Lin wrote:
>>> When creating the user process, it usually uses the Copy-On-Write (COW)
>>> mechanism to save the memory usage and the cost of time for copying.
>>> COW defers the work of copying private memory and shares it across the
>>> processes as read-only. If either process wants to write in these
>>> memories, it will page fault and copy the shared memory, so the process
>>> will now get its private memory right here, which is called break COW.
>>
>> Yes. Lately we've been dealing with advanced COW+GUP pinnings (which
>> resulted in PageAnonExclusive, which should hit upstream soon), and
>> hearing about COW of page tables (and wondering how it will interact
>> with the mapcount, refcount, PageAnonExclusive of anonymous pages) makes
>> me feel a bit uneasy :)
>
> I saw the series patch of this and knew how complicated handling COW of
> the physical page was [1][2][3][4]. So the COW page table will tend to
> restrict the sharing only to the page table. This means any modification
> to the physical page will trigger the break COW of page table.
>
> Presently implementation will only update the physical page information
> to the RSS of the owner process of COW PTE. Generally owner is the
> parent process. And the state of the page, like refcount and mapcount,
> will not change under the COW page table.
>
> But if any situations will lead to the COW page table needs to consider
> the state of physical page, it might be fretful. ;-)

I haven't looked into the details of how GUP deals with these COW page
tables. But I suspect there might be problems with page pinning:
skipping copy_present_page() even for R/O pages is usually problematic
with R/O pinnings of pages. I might be just wrong.

>
>>>
>>> Presently this kind of technology is only used as the mapping memory.
>>> It still needs to copy the entire page table from the parent.
>>> It might cost a lot of time and memory to copy each page table when the
>>> parent already has a lot of page tables allocated. For example, here is
>>> the state table for mapping the 1 GB memory of forking.
>>>
>>> mmap before fork mmap after fork
>>> MemTotal: 32746776 kB 32746776 kB
>>> MemFree: 31468152 kB 31463244 kB
>>> AnonPages: 1073836 kB 1073628 kB
>>> Mapped: 39520 kB 39992 kB
>>> PageTables: 3356 kB 5432 kB
>>
>>
>> I'm missing the most important point: why do we care and why should we
>> care to make our COW/fork implementation even more complicated?
>>
>> Yes, we might save some page tables and we might reduce the fork() time,
>> however, which specific workload really benefits from this and why do we
>> really care about that workload? Without even hearing about an example
>> user in this cover letter (unless I missed it), I naturally wonder about
>> relevance in practice.
>>
>> I assume it really only matters if we fork() realtively large processes,
>> like databases for snapshotting. However, fork() is already a pretty
>> sever performance hit due to COW, and there are alternatives getting
>> developed as a replacement for such use cases (e.g., uffd-wp).
>>
>> I'm also missing a performance evaluation: I'd expect some simple
>> workloads that use fork() might be even slower after fork() with this
>> change.
>>
>
> The paper mentioned a list of benchmarks of the time cost for On-Demand
> fork. For example, on Redis, the meantime of fork when taking the
> snapshot. Default fork() got 7.40 ms; On-demand Fork (COW PTE table) got
> 0.12 ms. But there are some other cases, like the Response latency
> distribution of Apache HTTP Server, are not have significant benefits
> from their On-demand fork.

Thanks. I expected that snapshotting would pop up and be one of the most
prominent users that could benefit. However, for that specific use case
I am convinced that uffd-wp is the better choice and fork() is just the
old way of doing it. having nothing better at hand. QEMU already
implements snapshotting of VMs that way and I remember that redis also
intended to implement support for uffd-wp. Not sure what happened with
that and if there is anything missing to make it work.

>
> For the COW page table from this patch, I also take the perf to analyze
> the cost time. But it looks like not different from the default fork.

Interesting, thanks for sharing.

>
> Here is the report, the mmap-sfork is COW page table version:
>
> Performance counter stats for './mmap-fork' (100 runs):
>
> 373.92 msec task-clock # 0.992 CPUs utilized ( +- 0.09% )
> 1 context-switches # 2.656 /sec ( +- 6.03% )
> 0 cpu-migrations # 0.000 /sec
> 881 page-faults # 2.340 K/sec ( +- 0.02% )
> 1,860,460,792 cycles # 4.941 GHz ( +- 0.08% )
> 1,451,024,912 instructions # 0.78 insn per cycle ( +- 0.00% )
> 310,129,843 branches # 823.559 M/sec ( +- 0.01% )
> 1,552,469 branch-misses # 0.50% of all branches ( +- 0.38% )
>
> 0.377007 +- 0.000480 seconds time elapsed ( +- 0.13% )
>
> Performance counter stats for './mmap-sfork' (100 runs):
>
> 373.04 msec task-clock # 0.992 CPUs utilized ( +- 0.10% )
> 1 context-switches # 2.660 /sec ( +- 6.58% )
> 0 cpu-migrations # 0.000 /sec
> 877 page-faults # 2.333 K/sec ( +- 0.08% )
> 1,851,843,683 cycles # 4.926 GHz ( +- 0.08% )
> 1,451,763,414 instructions # 0.78 insn per cycle ( +- 0.00% )
> 310,270,268 branches # 825.352 M/sec ( +- 0.01% )
> 1,649,486 branch-misses # 0.53% of all branches ( +- 0.49% )
>
> 0.376095 +- 0.000478 seconds time elapsed ( +- 0.13% )
>
> So, the COW of the page table may reduce the time of forking. But it
> builds on the transfer of the copy work to other modified operations
> to the physical page.

Right.

>
>> I have tons of questions regarding rmap, accounting, GUP, page table
>> walkers, OOM situations in page walkers, but at this point I am not
>> (yet) convinced that the added complexity is really worth it. So I'd
>> appreciate some additional information.
>
> It seems like I have a lot of work to do. ;-)

Messing with page tables and COW is usually like opening a can of worms :)

--
Thanks,

David / dhildenb