Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA

From: Jason Gunthorpe
Date: Wed Feb 06 2019 - 12:31:21 EST


On Wed, Feb 06, 2019 at 10:50:00AM +0100, Jan Kara wrote:

> MM/FS asks for lease to be revoked. The revoke handler agrees with the
> other side on cancelling RDMA or whatever and drops the page pins.

This takes a trip through userspace since the communication protocol
is entirely managed in userspace.

Most existing communication protocols don't have a 'cancel operation'.

> Now I understand there can be HW / communication failures etc. in
> which case the driver could either block waiting or make sure future
> IO will fail and drop the pins.

We can always rip things away from the userspace.. However..

> But under normal conditions there should be a way to revoke the
> access. And if the HW/driver cannot support this, then don't let it
> anywhere near DAX filesystem.

I think the general observation is that people who want to do DAX &
RDMA want it to actually work, without data corruption, random process
kills or random communication failures.

Really, few users would actually want to run in a system where revoke
can be triggered.

So.. how can the FS/MM side provide a guarantee to the user that
revoke won't happen under a certain system design?

Jason