Re: Wait for mutex to become unlocked

From: Thomas Gleixner
Date: Wed May 04 2022 - 20:22:38 EST


On Wed, May 04 2022 at 22:44, Matthew Wilcox wrote:
> The customer has a low priority task which wants to read /proc/pid/smaps
> of a higher priority task. Today, everything is awful; smaps acquires
> mmap_sem read-only, is preempted, then the high-pri task calls mmap()
> and the down_write(mmap_sem) blocks on the low-pri task. Then all the
> other threads in the high-pri task block on the mmap_sem as they take
> page faults because we don't want writers to starve.

Welcome to the wonderful world of priority inversion.

> So this is Good. For the vast majority of cases, we avoid taking the
> mmap read lock and the problem will appear much less often. But we can
> do Better with a new API. You see, for this case, we don't actually
> want to acquire the mmap_sem; we're happy to spin a bit, but there's no
> point in spinning waiting for the writer to finish when we can sleep.
> I'd like to write this code:
>
> again:
> rcu_read_lock();
> vma = vma_lookup();
> if (down_read_trylock(&vma->sem)) {
> rcu_read_unlock();
> } else {
> rcu_read_unlock();
> rwsem_wait_read(&mm->mmap_sem);
> goto again;
> }
>
> That is, rwsem_wait_read() puts the thread on the rwsem's wait queue,
> and wakes it up without giving it the lock. Now this thread will never
> be able to block any thread that tries to acquire mmap_sem for write.

Never?

if (down_read_trylock(&vma->sem)) {

---> preemption by writer

The writer will still block and depending on the rest of the runnable
threads it can take quite a while until the low prio reader comes back
on a CPU.

I grant you that the propability will decrease, but 'never' is just
wishful thinking.

> Similarly, it may make sense to add rwsem_wait_write() and mutex_wait().
> Perhaps also mutex_wait_killable() and mutex_wait_interruptible()
> (the combinatoric explosion is a bit messy; I don't know that it makes
> sense to do the _nested, _io variants).
>
> Does any of this make sense?

TBH, no.

If we start opening this can of worms, then we'll see tons of "customer
want's a pony" problems being solved by half baken "solutions" which
will exactly cause the combinatoric explosion you are worried about.

If there is a legitimate requirement to retrieve such information, then
we are better off thinking about a general approach of introspection,
which makes such information available as unreliable snapshots
retrievable with RCU protection.

The information gathered from /proc/pid/smaps is unreliable at the point
where the lock is dropped already today. So it does not make a
difference whether the VMAs have a 'read me if you really think it's
useful' sideband information which gets updated when the VMA changes and
allows to do:

rcu_read_lock();
vma = vma_lookup();
if (vma)
dump(vma->info);
rcu_read_unlock();

You still need to decide, whether you want to provide that information
or not for a particular interface, but that's way more sane than the
'make locking more complex for questionable value' approach.

But looking at the stuff which gets recomputed and reevaluated in that
proc/smaps code this makes a lot of sense, because most if not all of
this information is already known at the point where the VMA is modified
while holding mmap_sem for useful reasons, no?

So no, we don't want to add more magic locking functions which pretend
to solve unsolvable problems. We rather go and make use of information
which is already available by providing a less archaic access mechanism.

What you are trying to do here is just adding to technical debt IMNSHO.

Thanks,

tglx