Re: [PATCH v5 26/44] tty: Add read-recursive, writer-prioritized rwsemaphore
From: Peter Hurley
Date: Mon Mar 18 2013 - 21:01:45 EST
On Mon, 2013-03-18 at 16:58 -0700, Greg Kroah-Hartman wrote:
> On Mon, Mar 11, 2013 at 04:44:46PM -0400, Peter Hurley wrote:
> > The semantics of a rw semaphore are almost ideally suited
> > for tty line discipline lifetime management; multiple active
> > threads obtain "references" (read locks) while performing i/o
> > to prevent the loss or change of the current line discipline
> > (write lock).
> > Unfortunately, the existing rw_semaphore is ill-suited in other
> > ways;
> > 1) obtaining a "reference" can be recursive, ie., a reference holder
> > may attempt to obtain another "reference". Recursive read locks
> > are not supported by rwsem.
> Why does a ldisc need to obtain this recursively?
You already discovered it doesn't (but it used to be required).
BTW, it's only because I had a real lock with lockdep support, that this
recursive usage which deadlocks was even discoverable.
> > 2) TIOCSETD ioctl (change line discipline) expects to return an
> > error if the line discipline cannot be exclusively locked within
> > 5 secs. Lock wait timeouts are not supported by rwsem.
> Don't we have some other lock that can timeout?
Not that behaves like a r/w semaphore.
> > 3) A tty hangup is expected to halt and scrap pending i/o, so
> > exclusive locking must be prioritized without precluding
> > existing reference holders from obtaining recursive read locks.
> > Writer priority is not supported by rwsem.
> But how bad is it really if we have to wait a bit for that write lock to
> get through all of the existing readers? Either way, we are supposed to
> be dropping i/o, so it shouldn't be a big deal, right?
The rwsem behavior is in the process of changing. Write lock stealing
has already been added and refinements there will likely allow some
readers in front of writers.
With slow serial i/o, I'd rather have hangups occur promptly than let a
bunch more i/o through.
> > Add ld_semaphore which implements these requirements in a
> > semantically and operationally similar way to rw_semaphore.
> I _really_ don't want to add a new lock to the kernel, especially one
> that is only used by one "driver". You are going to have to convince
> the current lock authors that this really is needed, before I can take
> it, sorry.
That's fine. I can understand the reluctance to take on a new lock
[although you might be interested to read my analysis of rwsem here
https://lkml.org/lkml/2013/3/11/533 which outlines an existing flaw].
That said, part of the reason why the current ldisc implementation is
broken is the lack of appropriate locks. As I recently explained
(actually in this patchset's thread),
a lack of existing options has spawned a DIY approach without
higher-order locks that is rarely correct, but which goes largely
unnoticed exactly because it's not a new lock. A brief review of the
hangs, races, and deadlocks fixed by this patchset should be convincing
enough of that fact. In my opinion, this is the overriding concern.
The two main problems with a one-size-fits-all lock policy is that,
1) lock experts can't realistically foresee the consequences of policy
changes without already being experts in the subsystems in which that
lock is used. Even domain experts may miss potential consequences, and
2) domain experts typically wouldn't even consider writing a new lock.
So they make do with atomic bit states, spinlocks, reference counts,
mutexes, and waitqueues, making a mostly-functional, higher-order lock.
>From whom would you like me to get an ack for this?
> What is wrong with the existing ldisc code that the creation of this
> lock is needed? Is our current code that broken?
Yes. Even just the acquistion of the ldisc reference is wrong [the
analysis is in the patch 21 changelog].
If you'd like, I can send you 6 or so short user test programs that
hang, crash, or deadlock inside 60 seconds on mainline and next, but not
with this patchset.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/