Re: [PATCH v2 01/17] mm/gup: Fixup p*_access_permitted()

From: Dave Hansen
Date: Fri Dec 15 2017 - 22:21:57 EST


On 12/15/2017 06:52 PM, Linus Torvalds wrote:
> On Fri, Dec 15, 2017 at 6:48 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>> Treating protection key bits as "escalate to page fault and let that
>> deal with the checks" should be fine
>
> Well, it's *semantically* fine and I think it's the right model from
> that standpoint.

It's _close_ to fine. :)

Practically, we're going to have two classes of things in the world:
1. Things that are protected with protection keys and have non-zero bits
in the pkey PTE bits.
2. Things that are _not_ protected will have zeros in there.

But, in the hardware, *everything* has a pkey. 0 is the default,
obviously, but the hardware treats it the same as all the other values.
So, if we go checking for the "pkey bits being set", and have behavior
diverge when they are set, we end up with pkey=0 being even more special
compared to the rest.

This might be OK, but it's going to be interesting to document and write
tests for it. I'm already dreading the manpage updates.

> However, since the main use case of protection keys is probably
> databases (Dave?) and since those also might be performance-sensitive
> about direct-IO doing page table lookups, it might not be great in
> practice.

Yeah, databases are definitely the heavy-hitters that care about it.

But, these PKRU checks are cheap. I forget the actual cycle counts, but
I remember thinking that it's pretty darn cheap to read PKRU. In the
grand scheme of doing a page table walk and incrementing an atomic, it's
surely in the noise for direct I/O to large pages, which is basically
guaranteed for the database guys.

I did some get_user_pages() torture tests (on small pages IIRC) before I
put the code in and could not detect a delta from the code being there
or not.