Re: [Bug #14388] keyboard under X with 2.6.31

From: Linus Torvalds
Date: Tue Oct 13 2009 - 21:05:40 EST




On Tue, 13 Oct 2009, Paul Fulghum wrote:
>
> This is correct, the last buffer is not passed to tty_buffer_free()
> if it is the last in the list so tail is maintained.
> There is no free space in it so no new data can be added.
> There is no place where tail is null while the spinlock
> is released in preparation for calling receive_buf.
> I still can't spot any flaw in the current locking.

Do you even bother reading my emails?

Let me walk through an example of where the locking F*CKS UP, exactly
because it's broken.

thread1 thread2 thread3

flush_to_ldisc
set_bit(TTY_FLUSHING)
buf.head = NULL
...
..release lock..
.. sleep in ->receive_buf ..

flush_to_ldisc
set_bit(TTY_FLUSHING)
.. head==NULL ..
clear_bit(TTY_FLUSHING)
.. release lock ..

tty_ldisc_flush()
-> tty_buffer_flush()
TTY_FLUSHING not set!
-> __tty_buffer_flush()
-> tty->buf.tail = NULL

and now you're screwed. See? You have both 'buf.tail' and 'buf.head' both
being NULL, and look what happens in that case 'tty_buffer_request_room()'
if some new data comes in? Right: it will add the buffer to both tail and
head.

And notice how 'thread1' is still inside flush_to_ldisc()! The buffer that
got added will be overwritten by the old one, and now tail and head no
longer match. Or another flush_to_ldisc() comes in, and now it won't be a
no-op any more, and it will find the new data, and run ->receive_buf
concurrently with the old receive_buf from thread1.

And the whole reason was that there were some very odd locking rules:
buf.head=NULL meant "don't flush", and "TTY_FLUSHING is set" meant "don't
clear 'buf.head'", and but the "don't flush" case still cleared
TTY_FLUSHING (after not flushing), and it all messed up.

I could just have fixed it (move the "clear_bit(TTY_FLUSHING)" but up, but
the fact is, once you fix that, it then becomes obvious that
"buf.head=NULL" really is the wrong thing to test in the first place, and
we should just use TTY_FLUSHING instead, and simply _remove_ the odd
"buf.head=NULL is special" case. Which is what my patch did

> Your statement that the locking is too clever/subtle is
> clearly true since I am struggling to work this out again.

I have to say that the only case I could make up that is _clearly_ a bug
is the above very contrieved example. I don't really think something like
the above happens in reality. But it's an example of bad locking, and what
happens when the locking logic isn't obvious.

There may be other cases where the locking fails, and I just didn't find
them.

Or the patch may simply not fix anything in practice, and nobody has ever
actually triggered the bad locking in real life. I dunno. I just do know
that the locking was too damn subtle.

Any time people do ad-hoc locking with "clever" schemes, it's almost
invariably buggy. So the rule is: just don't do that. Make the locking
rules "obvious". Don't have subtle rules about "if head is NULL, then
we're not going to add any new buffers to it, except if tail is also
NULL". Because look above what happens, and see how complicated it was to
even see the bug.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/