Re: [PATCH v3 00/13] locks: saner method for managing file locks

From: Jeff Layton
Date: Mon Feb 02 2015 - 15:42:53 EST


On Mon, 2 Feb 2015 15:29:33 -0500
Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:

> I applied Jeff's patch to my orangefs-3.19-rc5 tree. Orangefs
> doesn't yet support the lock callout for file_operations, but
> we have been experimenting with some ideas that would allow
> Orangefs to honor locks in our distributed environment: basically
> posix locks for each kernel client plus meta data in our userspace
> servers.
>
> Anyhow, the lock callout in the Orangefs patch I've posted
> just returns ENOSYS. I have added posix locks in a current
> test, so now I have:
>
> int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
> {
> int rc;
>
> rc = posix_lock_file(filp, fl, NULL);
>
> return rc;
> }
>
> So... while this isn't safe for our real world distributed environment,
> it makes the kernel with this version of the Orangefs kernel client
> loaded on it think that locks work.
>
> Besides observing that locks seem to work by running some programs
> that need locks (like svn/sqlite) I also ran xfstest generic/131.
>
> Without Jeff's patch, xfstest generic/131 fails:
>
> 29:Verify that F_GETLK for F_WRLCK doesn't require that file be
> opened for write
>
> Same with Jeff's patch.
>
> Jeff's patch applied painlessly, and my simple tests didn't uncover
> any problems with Jeff's implementation of separate lists for different
> lock types, so that's good.
>
> What surprised me, though, is that the posix lock code failed
> to get all the way through xfstest generic/131... maybe you
> all knew that already?
>
> -Mike
>

Hmm... I haven't seen 131 fail like this, but your code above looks
wrong. You're ignoring the "cmd" argument, so even F_GETLK requests are
setting a lock.

I think you might want to do something like:

int pvfs2_lock(struct file *filp, int cmd, struct file_lock *fl)
{
if (cmd == F_GETLK)
return posix_test_lock(filp, fl);
return posix_lock_file(filp, fl, fl);
}

...untested of course, but you get the idea.


> On Thu, Jan 22, 2015 at 9:27 AM, Jeff Layton
> <jeff.layton@xxxxxxxxxxxxxxx> wrote:
> > v3:
> > - break out a ceph locking cleanup patch into a separate one earlier
> > in the series
> > - switch back to using the i_lock to protect assignment of i_flctx.
> > Using cmpxchg() was subject to races that I couldn't quite get a
> > grip on. Eventually I may switch it back, but it probably doesn't
> > provide much benefit.
> > - add a patch to clean up some comments that refer to i_flock
> > - use list_empty_careful in lockless checks rather than list_empty
> >
> > v2:
> > - bugfix to the flc_posix list ordering that broke the split/merge code
> > - don't use i_lock to manage the i_flctx pointer. Do so locklessly
> > via cmpxchg().
> > - reordering of the patches to make the set bisectable. As a result
> > the new spinlock is not introduced until near the end of the set
> > - some cleanup of the lm_change operation was added, made possible
> > by the move to standard list_heads for tracking locks
> > - it takes greater pains to avoid spinlocking by checking when the
> > lists are empty in addition to whether the i_flctx pointer is NULL
> > - a patch was added to keep count of the number of locks, so we can
> > avoid having to do count/alloc/populate in ceph and cifs
> >
> > This is the third iteration of this patchset. The second was posted
> > on January 8th, under the cover letter entitled:
> >
> > [PATCH v2 00/10] locks: saner method for managing file locks
> >
> > The big change here is that it once again uses the i_lock to protect the
> > i_flctx pointer assignment. In principle we should be able to use
> > cmpxchg instead, but that seems leave open a race that causes
> > locks_remove_file to miss recently-added locks. Given that is not a
> > super latency-sensitive codepath, an extra spinlock shouldn't make much
> > difference.
> >
> > Many thanks to Sasha Levin who helped me nail a race that would leave
> > leftover locks on the i_flock list, and David Howells who convinced
> > me to just go ahead back to using the i_lock to protect that field.
> >
> > There are some other minor changes, but overall it's pretty similar
> > to the last set. I still plan to merge this for v3.20.
> >
> > ------------------------[snip]-------------------------
> >
> > We currently manage all file_locks via a singly-linked list. This is
> > problematic for a number of reasons:
> >
> > - we have to protect all file locks with the same spinlock (or
> > equivalent). Currently that uses the i_lock, but Christoph has voiced
> > objections due to the potential for contention with other i_lock
> > users. He'd like to see us move to using a different lock.
> >
> > - we have to walk through irrelevant file locks in order to get to the
> > ones we're interested in. For instance, POSIX locks are at the end
> > of the list, so we have to skip over all of the flock locks and
> > leases before we can work with them.
> >
> > - the singly-linked list is primitive and difficult to work with. We
> > have to keep track of the "before" pointer and it's easy to get that
> > wrong.
> >
> > Cleaning all of this up is complicated by the fact that no one really
> > wants to grow struct inode in order to do so. We have a single pointer
> > in the inode now and I don't think we want to use any more.
> >
> > One possibility that Trond raised was to move this to an hlist, but
> > that doesn't do anything about the desire for a new spinlock.
> >
> > This patchset takes the approach of replacing the i_flock list with a
> > new struct file_lock_context that is allocated when we intend to add a
> > new file lock to an inode. The file_lock_context is only freed when we
> > destroy the inode.
> >
> > Within that, we have separate (and standard!) lists for each lock type,
> > and a dedicated spinlock for managing those lists. In principle we could
> > even consider adding separate locks for each, but I didn't bother with
> > that for now.
> >
> > For now, the code is still pretty "raw" and isn't bisectable. This is
> > just a RFC for the basic approach. This is probably v3.19 material at
> > best.
> >
> > Anyone have thoughts or comments on the basic approach?
> >
> > Jeff Layton (13):
> > locks: add new struct list_head to struct file_lock
> > locks: have locks_release_file use flock_lock_file to release generic
> > flock locks
> > locks: add a new struct file_locking_context pointer to struct inode
> > ceph: move spinlocking into ceph_encode_locks_to_buffer and
> > ceph_count_locks
> > locks: move flock locks to file_lock_context
> > locks: convert posix locks to file_lock_context
> > locks: convert lease handling to file_lock_context
> > locks: remove i_flock field from struct inode
> > locks: add a dedicated spinlock to protect i_flctx lists
> > locks: clean up the lm_change prototype
> > locks: keep a count of locks on the flctx lists
> > locks: consolidate NULL i_flctx checks in locks_remove_file
> > locks: update comments that refer to inode->i_flock
> >
> > fs/ceph/locks.c | 64 +++---
> > fs/ceph/mds_client.c | 4 -
> > fs/cifs/file.c | 34 +--
> > fs/inode.c | 3 +-
> > fs/lockd/svcsubs.c | 26 ++-
> > fs/locks.c | 569 +++++++++++++++++++++++++++------------------------
> > fs/nfs/delegation.c | 23 ++-
> > fs/nfs/nfs4state.c | 70 ++++---
> > fs/nfs/pagelist.c | 6 +-
> > fs/nfs/write.c | 41 +++-
> > fs/nfsd/nfs4state.c | 21 +-
> > fs/read_write.c | 2 +-
> > include/linux/fs.h | 52 +++--
> > 13 files changed, 510 insertions(+), 405 deletions(-)
> >
> > --
> > 2.1.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/