Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

From: Ingo Molnar
Date: Mon May 26 2008 - 10:00:30 EST



* Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxx> wrote:

> > in terms of debugging there's not much i can do i'm afraid. It's not
> > possible to get a tcpdump of this incident, given the extreme amount
> > of load these testboxes handle.
>
> ...but you can still tcpdump that particular flow once the situation
> is discovered to see if TCP still tries to do something, no? One needs
> to tcpdump couple of minutes at minimum. Also please get /proc/net/tcp
> for that flow around the same time.

ok, will try those.

> > One clue (which might or might not matter) is that distcc is one of
> > the very few applications that makes use of sendfile().
>
> Can you please try with /proc/sys/net/ipv4/tcp_frto set to zero though
> recv-q symptom seems weird would it be related to that (but there were
> some recent fixes to FRTO and retrans_stamp change could have some
> significance here)?
>
> Other than that, nothing since -rc1 seems suspicious to me (though I
> hardly understand every part of networking).

ok, i will first wait for it to trigger on a box and will do the tcpdump
session (and /proc/net/tcp output), then i'll continue the tests with
this done in the rc.local:

echo 0 > /proc/sys/net/ipv4/tcp_frto

and will see whether the hung connections still occur. The cycle of
testing will be very slow i suspect.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/