Re: [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE

From: David Hildenbrand
Date: Thu Feb 16 2023 - 12:01:47 EST



There are various reasons why I think a UFFD_FEATURE_WP_UNPOPULATED, using
PTE markers, would be more benficial:

1) It would be applicable to anon hugetlb

Anon hugetlb should already work with non ptes with the markers?


... really? I thought we'd do the whole pte marker handling only when dealing with hugetlb/shmem. Interesting, thanks. (we could skip population in QEMU in that case as well -- we always do it for now)

2) It would be applicable even when the zeropage is disallowed
(mm_forbids_zeropage())

Do you mean s390 can disable zeropage with mm_uses_skeys()? So far uffd-wp
doesn't support s390 yet, I'm not sure whether we over worried on this
effect.

Or is there any other projects / ideas that potentially can enlarge forbid
zero pages to more contexts?

I think it was shown that zeropages can be used to build covert channels (similar to memory deduplciation, because it effectively is memory deduplication). It's mentioned as a note in [1] under VII. A. ("Only Deduplicate Zero Pages.")


[1] https://www.ndss-symposium.org/wp-content/uploads/2022-81-paper.pdf


3) It would be possible to optimize even without the huge zeropage, by
using a PMD marker.

This patch doesn't need huge zeropage being exist.

Yes, and for that reason I think it may perform worse than what we already have in some cases. Instead of populating a single PMD you'll have to fill a full PTE table.


4) It would be possible to optimize even on the PUD level using a PMD
marker.

I think 3+4 is in general an interesting idea on using pte markers on
higher than pte levels, but that needs more changes.

Firstly, keep using pte markers is somehow preallocating the pgtables, so a
side effect of it could be speeding up future faults because they'll all
split into pmd locks and read doesn't need to fault at all, only writes.

Imagine when you hit a page fault on a pmd marker, it means you'll need to
spread that "marker" information to child ptes and you must - it moves the
slow operation of WP into future page faults in some way. In some cases
(I'd say, most cases..) that's not wanted. The same to PUDs.

Right, but user space already has that option (see below).



Especially when uffd-wp'ing large ranges that are possibly all unpopulated
(thinking about the existing VM background snapshot use case either with
untouched memory or with things like free page reporting), we might neither
be reading or writing that memory any time soon.

Right, I think that's a trade-off. But I still think large portion of
totally unpopulated memory should be rare case rather than majority, or am
I wrong? Not to mention that requires a more involved changeset to the
kernel.

So what I proposed here is the (AFAIU) simplest solution towards providing
such a feature in a complete form. I think we have chance to implement it
in other ways like pte markers, but that's something we can work upon, and
so far I'm not sure how much benefit we can get out of it yet.


What you propose here can already be achieved by user space fairly easily (in fact, QEMU implementation could be further sped up using MADV_POPULATE_READ). Usually, we only do that when there are very good reasons to (performance).

Using PTE markers would provide a real advantage IMHO for some users (IMHO background snapshots), where we might want to avoid populating zeropages/page tables as best as we can completely if the VM memory is mostly untouched.

Naturally, I wonder if UFFD_FEATURE_WP_ZEROPAGE is really worth it. Is there is another good reason to combine the populate zeropage+wp that I am missing (e.g., atomicity by doing both in one operation)?

--
Thanks,

David / dhildenb