Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*()

From: Dan Williams
Date: Mon Jun 18 2018 - 16:04:45 EST


On Mon, Jun 18, 2018 at 12:31 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> On Mon, Jun 18, 2018 at 12:21:46PM -0700, Dan Williams wrote:
>> On Mon, Jun 18, 2018 at 11:14 AM, John Hubbard <jhubbard@xxxxxxxxxx> wrote:
>> > On 06/18/2018 10:56 AM, Dan Williams wrote:
>> >> On Mon, Jun 18, 2018 at 10:50 AM, John Hubbard <jhubbard@xxxxxxxxxx> wrote:
>> >>> On 06/18/2018 01:12 AM, Christoph Hellwig wrote:
>> >>>> On Sun, Jun 17, 2018 at 01:28:18PM -0700, John Hubbard wrote:
>> >>>>> Yes. However, my thinking was: get_user_pages() can become a way to indicate that
>> >>>>> these pages are going to be treated specially. In particular, the caller
>> >>>>> does not really want or need to support certain file operations, while the
>> >>>>> page is flagged this way.
>> >>>>>
>> >>>>> If necessary, we could add a new API call.
>> >>>>
>> >>>> That API call is called get_user_pages_longterm.
>> >>>
>> >>> OK...I had the impression that this was just semi-temporary API for dax, but
>> >>> given that it's an exported symbol, I guess it really is here to stay.
>> >>
>> >> The plan is to go back and provide api changes that bypass
>> >> get_user_page_longterm() for RDMA. However, for VFIO and others, it's
>> >> not clear what we could do. In the VFIO case the guest would need to
>> >> be prepared handle the revocation.
>> >
>> > OK, let's see if I understand that plan correctly:
>> >
>> > 1. Change RDMA users (this could be done entirely in the various device drivers'
>> > code, unless I'm overlooking something) to use mmu notifiers, and to do their
>> > DMA to/from non-pinned pages.
>>
>> The problem with this approach is surprising the RDMA drivers with
>> notifications of teardowns. It's the RDMA userspace applications that
>> need the notification, and it likely needs to be explicit opt-in, at
>> least for the non-ODP drivers.
>
> Well, more than that, we have no real plan on how to accomplish this,
> or any idea if it can even really work.. Most userspace give up
> control of the memory lifetime to the remote side of the connection
> and have no way to recover it other than a full teardown.
>
> Given that John is trying to fix a kernel oops, I don't think we
> should tie progress on it to the RDMA notification idea.
>
> .. and given that John is trying to fix a kernel oops, maybe the
> weird/bad/ugly behavior of ftruncte is a better bug to have than for
> unprivileged users to be able to oops the kernel???

Trading one bug for another is not a fix. We did not fix the
DAX-dma-vs-trruncate bug by breaking truncate() guarantees.