Re: Handshaking on USB serial devices

From: David Newall
Date: Thu Feb 21 2008 - 10:21:57 EST


Alan,

Alan Cox wrote:
>> That's a very good point. Even so: on the 2.4 driver, write_room isn't
>> implemented (refer to a previous message by Alan); and in 2.6, a 1k
>> buffer is built into the driver, with nothing to prevent it being sent
>> when the hardware buffer fills.
>>
[...]

> Careful - a lot of hardware handles this properly itself, you simply
> don't get the URB completing until its all done.
>

I am incredibly grateful to you for telling me this. I didn't know it
and hadn't noticed that that was occurring; it was. I thought data was
being lost by the converter -- I could see where it was missing on the
screen! -- but it's actually a bug in the tty layer.

Here's what really was happening: The PL2303 contains a 256 byte buffer,
and write URBs don't complete when that's full, as you said. At such
times, that is when a bulk write was pending, the tty layer could and
would still call pl2303_write, which would return 0. The tty layer
would loop, trying one byte followed by the remaining count, repeating
until no data remained. The following clause, from write_chan in
n_tty.c, appears responsible; it is within a while(1) loop:

if (O_OPOST(tty) && !(test_bit(TTY_HW_COOK_OUT, &tty->flags))) {
while (nr > 0) {
ssize_t num = opost_block(tty, b, nr);
if (num < 0) {
if (num == -EAGAIN)
break;
retval = num;
goto break_out;
}
b += num;
nr -= num;
if (nr == 0)
break;
get_user(c, b);
if (opost(c, tty) < 0)
break;
b++; nr--;
}
if (tty->driver.flush_chars)
tty->driver.flush_chars(tty);
} ...


Note that opost() assumes that driver.putchar (which translates to a
one-byte write) always succeeds. This behaviour is documented.

By the way, the true cause of the problem was somewhat disguised by the
fact that space in the PL2303's buffer could become available part-way
through the cycle, causing non-deterministic (i.e. not exactly
reproducible) loss of part of a write; it looked exactly like a flow
control problem.

The (2.4, remember) PL2303 driver has no write_room function, which I
gather is the appropriate place to signal that a write cannot be
performed. Providing one that returns 0 when the (singleton) write urb
is busy, resolves the issue. The patch that I had previously been
developing is entirely wrong. Oh well. Mind you, providing a
write_room function is NOT a real solution; it merely reduces the
condition to a usually-winnable race. (Ironically, when I back-ported
the 2.6 driver, I excluded its new write_room function. Mistake.)

Thanks for the nugget.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/