Re: [Bug #14388] keyboard under X with 2.6.31

From: Linus Torvalds
Date: Tue Oct 13 2009 - 16:55:37 EST




On Tue, 13 Oct 2009, Boyan wrote:
>
> Composite is enabled in my X config, but I don't have compiz or
> something like that enabled. DRI is enabled.

I think I may actually see the problem. And if I'm right, then the bug you
guys bisected down to is really the fundamental reason. Embarrassing. I
was so convinced it should only change flush timing, that I didn't think
through all the possibilities.

The reason for thinking that it only really changes timing si fairly
simple: the only thing it really does is to call "flush_to_ldisc()"
synchronously when needed. On the face of it, that should be perfectly
safe.

But flush_to_ldisc() itself has a real oddity: it uses "tty->buf.lock" to
protect everything, BUT NOT THE ACTUAL CALL TO ->receive_buf()!

So even though that function looks _trivially_ atomic, once you look
deeper it suddenly becomes clear how it's not really atomic at all: it
will do all the buffer handling with the spinlock held, but then after it
has figured out the buffer, it does:

...
spin_unlock_irqrestore(&tty->buf.lock, flags);
disc->ops->receive_buf(tty, char_buf,
flag_buf, count);
spin_lock_irqsave(&tty->buf.lock, flags);
...

and by releasing that lock it actually seems to break all the buffering
guarantees! What can happen is:

CPU1 (or thread1 with PREEMPTION)
CPU2 (or thread2 with PREEMPTION)

flush_to_ldisc()
...
spin_lock_irqsave(..)
.. get one buffer..
spin_unlock_irqrestore(..);

<- PREEMPTION POINT, anything can happen ->
<- more buffers can be added, etc ->

flush_to_ldisc()
spin_lock_irqsave(..)
.. get second buffer..
spin_unlock_irqrestore(..);
->receive_buf(tty, char_buf, ...
spin_lock_irqrestore(..)
.. all done ..


->receive_buf(tty, char_buf, ...
spin_lock_irqrestore(...)

Notice how the "->receive_buf()" calls were done out of order, even if the
data was perfectly in-order in the buffers.

And you can get the same race on SMP even without preemption, just thanks
to CPU's hitting that lock just right. CONFIG_PREEMPT just makes it easier
(probably _much_ easier) to trigger, and possible even on UP.

As far as I can tell, this is not really a new bug (it could have happened
with low_latency before too), but on a tty without low_latency it would
never happen until the commit you bisected to because the workqueue itself
would serialize everything, and only one flush would ever be pending.

Anyway, the above explanation "feels right". It would easily explain the
behavior, because if the ->receive_buf() calls get re-ordered, then the
events get re-ordered, and one simple case of that would be to see the key
"release" event before the key "press" event.

It also explains how that commit seems to be indicated so consistently. It
still requires some specific timing, but now it's not timing _introduced_
by the commit, it's an old bug that that commit exposed, and then needs
some unlucky timing to actually happen.

The sane fix would be to just run ->receive_buf() under the tty->buf.lock,
but I assume we'd have a lot of unhappy ldiscs if we did that (and
possibly irq latency problems too).

I think the

tty->buf.head = NULL;
...
/* Restore the queue head */
tty->buf.head = head;

around that loop is actually there to try to avoid this whole problem, but
whoever did that didn't realize that there are other things that could set
buf.head (in particular, tty_buffer_request_room() while the lock is
dropped, so that whole logic is totally broken anyway and might even
conspire to make the problem worse (ie if somebody tries to add data while
->receive_buf() is running and the lock is gone, you are now _really_
screwing things up).

So instead of playing games with buf.head, I think we should just rely on
the TTY_FLUSHING bit. I'm not _entirely_ happy with this, because now if
we call flush_to_ldisc() while somebody else is busy flushing it, it will
return early even though the flush hasn't completed yet. But that was
always true to some degree (ie the "buffer full" case).

Anyway, I'm not entirely happy with this patch, and I haven't actually
TESTED it so it might well be totally broken, but something along the
lines of the appended may just fix it. It would be good if people who see
this problem tried it out.

Linus
---
drivers/char/tty_buffer.c | 31 +++++++++++++------------------
1 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/drivers/char/tty_buffer.c b/drivers/char/tty_buffer.c
index 3108991..da59334 100644
--- a/drivers/char/tty_buffer.c
+++ b/drivers/char/tty_buffer.c
@@ -402,28 +402,24 @@ static void flush_to_ldisc(struct work_struct *work)
container_of(work, struct tty_struct, buf.work.work);
unsigned long flags;
struct tty_ldisc *disc;
- struct tty_buffer *tbuf, *head;
- char *char_buf;
- unsigned char *flag_buf;

disc = tty_ldisc_ref(tty);
if (disc == NULL) /* !TTY_LDISC */
return;

spin_lock_irqsave(&tty->buf.lock, flags);
- /* So we know a flush is running */
- set_bit(TTY_FLUSHING, &tty->flags);
- head = tty->buf.head;
- if (head != NULL) {
- tty->buf.head = NULL;
- for (;;) {
- int count = head->commit - head->read;
+
+ if (test_and_set_bit(TTY_FLUSHING, &tty->flags)) {
+ struct tty_buffer *head;
+ while ((head = tty->buf.head) != NULL) {
+ int count;
+ char *char_buf;
+ unsigned char *flag_buf;
+
+ count = head->commit - head->read;
if (!count) {
- if (head->next == NULL)
- break;
- tbuf = head;
- head = head->next;
- tty_buffer_free(tty, tbuf);
+ tty->buf.head = head->next;
+ tty_buffer_free(tty, head);
continue;
}
/* Ldisc or user is trying to flush the buffers
@@ -445,9 +441,9 @@ static void flush_to_ldisc(struct work_struct *work)
flag_buf, count);
spin_lock_irqsave(&tty->buf.lock, flags);
}
- /* Restore the queue head */
- tty->buf.head = head;
+ clear_bit(TTY_FLUSHING, &tty->flags);
}
+
/* We may have a deferred request to flush the input buffer,
if so pull the chain under the lock and empty the queue */
if (test_bit(TTY_FLUSHPENDING, &tty->flags)) {
@@ -455,7 +451,6 @@ static void flush_to_ldisc(struct work_struct *work)
clear_bit(TTY_FLUSHPENDING, &tty->flags);
wake_up(&tty->read_wait);
}
- clear_bit(TTY_FLUSHING, &tty->flags);
spin_unlock_irqrestore(&tty->buf.lock, flags);

tty_ldisc_deref(disc);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/