Re: kernel panics at google

From: Alan Cox (alan@lxorguk.ukuu.org.uk)
Date: Thu Jan 20 2000 - 10:58:16 EST


> If I'm interpreting the kernel oops correctly, the error occurs in
> tcp_retrans_try_collapse. I've attached below the oops, after
> attempting to run it through ksymoops. (I think I got the right
> symbols, but I'm not absolutely certain.)

It seems to , although the error appears to have occurred at some point
before that, and tcp_retrans_try_collapse gets handed a buffer which appears
to either have been freed, or not to be on a list (skb->next == NULL).

> We could really use informed suggestions about what could be triggering
> this, or how we might try to work around it. Thanks very much in
> advance for any help.

Workaround suggestion: echo "0" >/proc/sys/net/ipv4/tcp_retrans_collapse

> >>EIP: c0158892 <tcp_retrans_try_collapse+3a/208>
> Trace: c01462cb <__kfree_skb+97/a0>
> Trace: c0158be7 <tcp_retransmit_skb+a3/164>
> Trace: c0158d0d <tcp_xmit_retransmit_queue+65/e4>

This seems to to be the interesting area and looks suspicious:

Dave: Side item tcp_retrans_try_collapse doesnt seem to care about collapsing
in SACK'd data - is that right ?

Second question: tcp_xmit_retransmit_queue is entered with tcp->retrans_head
pointing part way into the retransmit queue if we baled part way through
the previous retransmit run (eg cos cwnd got us). We guarantees that
tp->retrans_head is still pointing to a valid packet so right now I
cant see a cause.

Do all the crashes basically look identical ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:22 EST