Re: [PATCH 1/6] lib: Implement range locks

From: Jan Kara
Date: Mon Feb 11 2013 - 07:58:37 EST


On Mon 11-02-13 03:03:30, Michel Lespinasse wrote:
> On Mon, Feb 11, 2013 at 2:27 AM, Jan Kara <jack@xxxxxxx> wrote:
> > On Sun 10-02-13 21:42:32, Michel Lespinasse wrote:
> >> On Thu, Jan 31, 2013 at 1:49 PM, Jan Kara <jack@xxxxxxx> wrote:
> >> > +void range_lock_init(struct range_lock *lock, unsigned long start,
> >> > + unsigned long end);
> >> > +void range_lock(struct range_lock_tree *tree, struct range_lock *lock);
> >> > +void range_unlock(struct range_lock_tree *tree, struct range_lock *lock);
> >>
> >> Is there a point to separating the init and lock stages ? maybe the API could be
> >> void range_lock(struct range_lock_tree *tree, struct range_lock *lock,
> >> unsigned long start, unsigned long last);
> >> void range_unlock(struct range_lock_tree *tree, struct range_lock *lock);
> > I was thinking about this as well. Currently I don't have a place which
> > would make it beneficial to separate _init and _lock but I can imagine such
> > uses (where you don't want to pass the interval information down the stack
> > and it's easier to pass the whole lock structure). Also it looks a bit
> > confusing to pass (tree, lock, start, last) to the locking functon. So I
> > left it there.
> >
> > OTOH I had to somewhat change the API so that the locking phase is now
> > separated in "lock_prep" phase which inserts the node into the tree and
> > counts blocking ranges and "wait" phase which waits for the blocking ranges
> > to unlock. The reason for this split is that while "lock_prep" needs to
> > happen under some lock synchronizing operations on the tree, "wait" phase
> > can be easily lockless. So this allows me to remove the knowledge of how
> > operations on the tree are synchronized from range locking code itself.
> > That further allowed me to use mapping->tree_lock for synchronization and
> > basically reduce the cost of mapping range locking close to 0 for buffered
> > IO (just a single tree lookup in the tree in the fast path).
>
> Ah yes, being able to externalize the lock is good.
>
> I think in this case, it makes the most sense for lock_prep phase to
> also initialize the lock node, though.
I guess so.

> >> Reviewed-by: Michel Lespinasse <walken@xxxxxxxxxx>
> > I actually didn't add this because there are some differences in the
> > current version...
>
> Did I miss another posting of yours, or is that coming up ?
That will come. But as Dave Chinner pointed out for buffered writes we
should rather lock the whole range specified in the syscall (to avoid
strange results of racing truncate / write when i_mutex isn't used) and
that requires us to put the range lock above mmap_sem which isn't currently
easily possible due to page fault handling... So if the whole patch set
should go anywhere I need to solve that somehow.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/