Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock

From: Suren Baghdasaryan
Date: Tue Jan 17 2023 - 14:11:14 EST


On Tue, Jan 17, 2023 at 12:34 AM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On Mon, 16 Jan 2023 20:52:45 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > On Mon, Jan 16, 2023 at 7:16 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
> > > No you are not.
> >
> > I'm not wrong or the other way around? Please expand a bit.
>
> You are not wrong.

Ok, I think if I rewrite the vma_read_trylock() we should be fine?:

static inline bool vma_read_trylock(struct vm_area_struct *vma)
{
int count, new;

/* Check before locking. A race might cause false locked result. */
if (READ_ONCE(vma->vm_lock->lock_seq) ==
READ_ONCE(vma->vm_mm->mm_lock_seq))
return false;

count = atomic_read(&vma->vm_lock->count);
for (;;) {
/*
* Is VMA is write-locked? Overflow might produce false
locked result.
* False unlocked result is impossible because we modify and check
* vma->vm_lock_seq under vma->vm_lock protection and
mm->mm_lock_seq
* modification invalidates all existing locks.
*/
if (count < 0)
return false;

new = count + 1;
/* If atomic_t overflows, fail to lock. */
if (new < 0)
return false;

/*
* Atomic RMW will provide implicit mb on success to pair
with smp_wmb in
* vma_write_lock, on failure we retry.
*/
new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
if (new == count)
break;
count = new;
cpu_relax();
}
if (unlikely(READ_ONCE(vma->vm_lock->lock_seq) ==
READ_ONCE(vma->vm_mm->mm_lock_seq))) {
if (atomic_dec_and_test(&vma->vm_lock->count))
wake_up(&vma->vm_mm->vma_writer_wait);
return false;
}
return true;
}
> > >
> > > If the writer lock owner is preempted by a reader while releasing lock,
> > >
> > > set count to zero
> > > <-- preempt
> > > wake up waiters
> > >
> > > then lock is owned by reader but with read waiters.
> > >
> > > That is buggy if write waiter starvation is allowed in this patchset.
> >
> > I don't quite understand your point here. Readers don't wait, so there
> > can't be "read waiters". Could you please expand with a race diagram
> > maybe?
>
> cpu3 cpu2
> --- ---
> taskA bond to cpu3
> down_write(&mm->mmap_lock);
> vma_write_lock L
> taskB fail to take L for read
> taskC fail to take mmap_lock for write
> taskD fail to take L for read
> vma_write_unlock_mm(mm);
>
> preempted by taskE
> taskE take L for read and
> read waiters of L, taskB and taskD,
> should be woken up
>
> up_write(&mm->mmap_lock);

Readers never wait for vma lock, that's why we have only
vma_read_trylock and no vma_read_lock. In your scenario taskB and
taskD will fall back to taking mmap_lock for read after they failed
vma_read_trylock. Once taskA does up_write(mmap_lock) they will be
woken up since they are blocked on taking mmap_lock for read.

>