Re: write() returning EAGAIN

Andi Kleen (ak@muc.de)
19 Nov 1998 19:39:14 +0100


In article <13907.47974.162375.492776@localhost.efn.org>,
stevev@efn.org (Steve VanDevender) writes:
> I'm seeing a behavior in netscape I've never seen before now that
> I'm running 2.1.129pre6. Interestingly enough trying to read
> slashdot.org tends to get netscape stuck in a loop where it does:

> write(17, "GET /images/topics/topicslashdot"..., 346) = -1 EAGAIN (Try again)

> over and over again.

> I see that write() is documented to return EAGAIN under some
> circumstances, but I've never seen an application get stuck like
> this. Could something in 2.1.129pre6 be responsible?

I've seen it since early 2.1.x (Maybe since the select->poll migration??)
It is probably some difference in poll/select handling, but I haven't tracked
it down yet.

Linux returns EAGAIN when the user is trying to write to an not-yet connected
socket and the socket is set to non blocking. The application is supposed to
check with poll/select first if the socket is writeable. Now if there is any
change between when writeability is signalled and the write succeeds that would
explain the bug - Now i've starred at the relevant code paths in both 2.0
and 2.1 extensively and didn't find a difference that could explain it. So it
might be some race.

Helpful would be:
- Others looking at the code, maybe more eyes find more. Starting points
are tcp.c:wait_for_tcp_connect and tcp.c:tcp_poll (in 2.1) and tcp_select
(in 2.0)
- Someone catching a tcpdump of such an accident (I suspect it has something
to do with asynchronous ICMP error handling, but that is just a theory)
- Someone catching a strace log of it happening _including_ the last system
calls before the endless loop.

Current work around is to use a proxy.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/