Re: [PATCH v8 0/4] Introduce mseal

From: Theo de Raadt
Date: Thu Feb 01 2024 - 17:25:10 EST


There is another problem with adding PROT_SEAL to the mprotect()
call.

What are the precise semantics?

If one reviews how mprotect() behaves, it is quickly clear that
it is very sloppy specification. We spent quite a bit of effort
making our manual page as clear as possible to the most it gaurantees,
in the standard, and in all the various Unix:

Not all implementations will guarantee protection on a page basis; the
granularity of protection changes may be as large as an entire region.
Nor will all implementations guarantee to give exactly the requested
permissions; more permissions may be granted than requested by prot.
However, if PROT_WRITE was not specified then the page will not be
writable.

Anything else is different.

That is the specification in case of PROT_READ, PROT_WRITE, and PROT_EXEC.

What happens if you add additional PROT_* flags?

Does mprotect still behave just as sloppy (as specified)?

Or it now return an error partway through an operation?

When it returns the error, does it skip doing the work on the remaining
region?

Or does it skip doing any protection operation at all? (That means the code
has to do two passes over the region; first one checks if it may proceed,
second pass performs the change. I think I've reat PROT_SEAL was supposed
to try to do things as one pass; is that actually possible without requiring
a second pass in the kernel?

To wit, do these two sequences have _exactly_ the same behaviour in
all cases that we can think of
- unmapped sub-regions
- sealed sub-regions
- and who knows what else mprotect() may encounter

a)

mprotect(addr, len, PROT_READ);
mseal(addr, len, 0);

b)

mprotect(addr, len, PROT_READ | PROT_SEAL);

Are they the same, or are they different?

Here's what I think: mprotect() behaves quite differently if you add
the PROT_SEAL flag, but I can't quite tell precisely what happens because
I don't understand the linux vm system enough.


(As an outsider, I have glanced at the new PROT_MTE flag changes; that
one seem to just "set a flag where possible", rather than performing
an action which could result in an error, and seems to not have this
problem).


As an outsider, Linux development is really strange:

Two sub-features are being pushed very hard, and the primary developer
doesn't have code which uses either of them. And once it goes in, it
cannot be changed.

It's very different from my world, where the absolutely minimal
interface was written to apply to a whole operating system plus 10,000+
applications, and then took months of testing before it was approved for
inclusion. And if it was subtly wrong, we would be able to change it.