Re: deadlock in synchronize_srcu() in debugfs?

From: Johannes Berg
Date: Fri Mar 31 2017 - 05:45:02 EST


On Fri, 2017-03-31 at 11:03 +0200, Nicolai Stange wrote:

> > 2)
> > There's a complete deadlock situation if this happens:
> >
> > CPU1 CPU2
> >
> > debugfs_file_read(file="foo") mutex_lock(&M);
> > srcu_read_lock(&debugfs_srcu); debugfs_remove(file="
> > bar")
> > mutex_lock(&M); synchronize_srcu(&de
> > bugfs_srcu)
> >
> > This is intrinsically unrecoverable.
>
> Let's address this in a second step.

I suspect that it's actually better to address both in the same step,
but whatever :)

> > That seems like a strange argument to me - something has to exist
> > for a process to be able to look up the file, and currently the
> > proxy also has to exist?
>
> No, the proxies are created at file _open_ time and installed at the
> struct file.
>
> Rationale: there are potentially many debugfs files with only few of
> them opened at a time and a proxy, i.e. a struct file_operations, is
> quite large.

Ok, that makes sense. But that's not really a show-stopper, is it?

You can either have a proxy or not have it at remove time, and if you
don't have one then you can remove safely, right? And if you do have a
proxy, then you have to write_lock() it.

Lookup of the proxy itself can still be protected by (S)RCU, but you
can't go into the debugfs file callbacks while you hold (S)RCU, so that
you can safely determine whether or not a proxy exists.

I'm handwaving though - there are problems here with freeing the proxy
again when you close a file. Perhaps something like
* first, remove the pointer and wait for a grace period
* write_lock() it to make sure nobody is still inside it
* delete it now

works.

> I'll work out a solution this weekend and send some RFC patches then.
>
Thanks!

johannes