Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ;-)

From: Ira Weiny
Date: Fri Aug 23 2019 - 13:15:13 EST


On Fri, Aug 23, 2019 at 10:59:14AM +1000, Dave Chinner wrote:
> On Wed, Aug 21, 2019 at 11:02:00AM -0700, Ira Weiny wrote:
> > On Tue, Aug 20, 2019 at 08:55:15AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Aug 20, 2019 at 11:12:10AM +1000, Dave Chinner wrote:
> > > > On Mon, Aug 19, 2019 at 09:38:41AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, Aug 19, 2019 at 07:24:09PM +1000, Dave Chinner wrote:
> > > > >
> > > > > > So that leaves just the normal close() syscall exit case, where the
> > > > > > application has full control of the order in which resources are
> > > > > > released. We've already established that we can block in this
> > > > > > context. Blocking in an interruptible state will allow fatal signal
> > > > > > delivery to wake us, and then we fall into the
> > > > > > fatal_signal_pending() case if we get a SIGKILL while blocking.
> > > > >
> > > > > The major problem with RDMA is that it doesn't always wait on close() for the
> > > > > MR holding the page pins to be destoyed. This is done to avoid a
> > > > > deadlock of the form:
> > > > >
> > > > > uverbs_destroy_ufile_hw()
> > > > > mutex_lock()
> > > > > [..]
> > > > > mmput()
> > > > > exit_mmap()
> > > > > remove_vma()
> > > > > fput();
> > > > > file_operations->release()
> > > >
> > > > I think this is wrong, and I'm pretty sure it's an example of why
> > > > the final __fput() call is moved out of line.
> > >
> > > Yes, I think so too, all I can say is this *used* to happen, as we
> > > have special code avoiding it, which is the code that is messing up
> > > Ira's lifetime model.
> > >
> > > Ira, you could try unraveling the special locking, that solves your
> > > lifetime issues?
> >
> > Yes I will try to prove this out... But I'm still not sure this fully solves
> > the problem.
> >
> > This only ensures that the process which has the RDMA context (RDMA FD) is safe
> > with regard to hanging the close for the "data file FD" (the file which has
> > pinned pages) in that _same_ process. But what about the scenario.
> >
> > Process A has the RDMA context FD and data file FD (with lease) open.
> >
> > Process A uses SCM_RIGHTS to pass the RDMA context FD to Process B.
>
> Passing the RDMA context dependent on a file layout lease to another
> process that doesn't have a file layout lease or a reference to the
> original lease should be considered a violation of the layout lease.
> Process B does not have an active layout lease, and so by the rules
> of layout leases, it is not allowed to pin the layout of the file.
>

I don't disagree with the semantics of this. I just don't know how to enforce
it.

> > Process A attempts to exit (hangs because data file FD is pinned).
> >
> > Admin kills process A. kill works because we have allowed for it...
> >
> > Process B _still_ has the RDMA context FD open _and_ therefore still holds the
> > file pins.
> >
> > Truncation still fails.
> >
> > Admin does not know which process is holding the pin.
> >
> > What am I missing?
>
> Application does not hold the correct file layout lease references.
> Passing the fd via SCM_RIGHTS to a process without a layout lease
> is equivalent to not using layout leases in the first place.

Ok, So If I understand you correctly you would support a failure of SCM_RIGHTS
in this case? I'm ok with that but not sure how to implement it right now.

To that end, I would like to simplify this slightly because I'm not convinced
that SCM_RIGHTS is a problem we need to solve right now. ie I don't know of a
user who wants to do this.

Right now duplication via SCM_RIGHTS could fail if _any_ file pins (and by
definition leases) exist underneath the "RDMA FD" (or other direct access FD,
like XDP etc) being duplicated. Later, if this becomes a use case we will need
to code up the proper checks, potentially within each of the subsystems. This
is because, with RDMA at least, there are potentially large numbers of MR's and
file leases which may have to be checked.

Ira