RE: [PATCH RFC] mm: add MAP_EXCLUSIVE to create exclusive user mappings

From: Reshetova, Elena
Date: Tue Oct 29 2019 - 07:25:20 EST


> The patch below aims to allow applications to create mappins that have
> pages visible only to the owning process. Such mappings could be used to
> store secrets so that these secrets are not visible neither to other
> processes nor to the kernel.

Hi Mike,

I have actually been looking into the closely related problem for the past
couple of weeks (on and off). What is common here is the need for userspace
to indicate to kernel that some pages contain secrets. And then there are
actually a number of things that kernel can do to try to protect these secrets
better. Unmap from direct map is one of them. Another thing is to map such
pages as non-cached, which can help us to prevent or considerably restrict
speculation on such pages. The initial proof of concept for marking pages as
"UNCACHED" that I got from Dave Hansen was actually based on mlock2()
and a new flag for it for this purpose. Since then I have been thinking on what
interface suits the use case better and actually selected going with new madvise()
flag instead because of all possible implications for fragmentation and performance.
My logic was that we better allocate the secret data explicitly (using mmap())
to make sure that no other process data accidentally gets to suffer.
Imagine I would allocate a buffer to hold a secret key, signal with mlock
to protect it and suddenly my other high throughput non-secret buffer
(which happened to live on the same page by chance) became very slow
and I don't even have an easy way (apart from mmap()ing it!) to guarantee
that it won't be affected.

So, I ended up towards smth like:

secret_buffer = mmap(NULL, PAGE_SIZE, ...)
madvise(secret_buffer, size, MADV_SECRET)

I have work in progress code here:
https://github.com/ereshetova/linux/commits/madvise

I haven't sent it for review, because it is not ready yet and I am now working
on trying to add the page wiping functionality. Otherwise it would be useless
to protect the page during the time it is used in userspace, but then allow it
to get reused by a different process later after it has been released back and
userspace was stupid enough not to wipe the contents (or was crashed on
purpose before it was able to wipe anything out).

We have also had some discussions with Tycho that XPFO can be also
applied selectively for such "SECRET" marked pages and I know that he has also
did some initial prototyping on this, so I think it would be great to decide
on userspace interface first and then see how we can assemble together all
these features.

The *very* far fetching goal for all of this would be something that Alan Cox
suggested when I started looking into this - actually have a new libc function to
allocate memory in a secure way, which can hide all the dancing with mmap()/madvise()
(or/and potentially interaction with a chardev that Andy was suggesting also) and
implement an efficient allocator for such secret pages. Openssl has its
own version of "secure heap", which is essentially mmap area with additional
MLOCK_ONFAULT and MADV_DONTDUMP flags for protection. Some other
apps or libs must use smth similar if they want additional protection, which
makes them to reimplement the same concept again and again. Sadly or surprisingly
other major libs like boringssl, mbedTLS or client like openssh do not user any mlock()/
madvise() flags for any additional protection of secrets that they hold in memory.
Maybe if all of it would be behind a single secure API situation would start to
change in userspace towards better.

Best Regards,
Elena.

.