[BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )

From: Cabaniols, Sebastien (Sebastien.Cabaniols@Compaq.com)
Date: Thu May 03 2001 - 11:16:02 EST


Hello,

I have a bug on an Alpha ES40 SMP 2.4.4.ac3 modified (TCP Bug from lkml)

Platform:

Linux Version:
-----------------------

My kernel is 2.4.4-ac3 with the tcp.c file modified as suggested by the
following patch.

>I see! Dave, please, take the second Andrea's patch (appended).
>It is really the cleanest one.

>Alexey

>--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May 1 10:44:57 2001
>+++ 2.4.4aa3/net/ipv4/tcp.c Tue May 1 12:00:25 2001
>@@ -1183,11 +1183,8 @@
 
> do_fault:
> if (skb->len==0) {
>- if (tp->send_head == skb) {
>- tp->send_head = skb->next;
>- if (tp->send_head == (struct
sk_buff*)&sk->write_queue)
>- tp->send_head = NULL;
>- }
>+ if (tp->send_head == skb)
>+ tp->send_head = NULL;
> __skb_unlink(skb, skb->list);
> tcp_free_skb(sk, skb);
> }
>
>-

This time, to show that it has nothing to do with the ftp server I used a
simple
rcp:

Experiment 1:
----------------------

 ES40-06 ES40-05

 rcp es40-05:/mnt/big/mid /tmp/toto Machine fine

 with a mid file not too big (1.4Megabytes) everything is fine
 

Experiment 2:
----------------------

 ES40-06 ES40-05

 rcp es40-05:/mnt/big/1Giga /tmp/toto Machine frozen

 the ES40-06 managed to retrieve only 11 Mbytes so I guess I can start again
with a 12 Megabytes file, It should trigger the bug.

Here is the log of the machine who crashed:
-----------------------------------------------------------------------

May 3 17:27:57 es40-05 PAM_unix[651]: (system-auth) session opened for user
root by (uid=0)
May 3 17:27:57 es40-05 in.rshd[651]: root@es40-06.idris.domain as root:
cmd='rcp -f /mnt/big/mid'
May 3 17:29:36 es40-05 PAM_unix[662]: (system-auth) session opened for user
root by (uid=0)
May 3 17:29:36 es40-05 in.rshd[662]: root@es40-06.idris.domain as root:
cmd='rcp -f /mnt/big/1Giga'
May 3 17:29:36 es40-05 kernel: <oomerang_rx(): status e001
May 3 17:29:36 es40-05 kernel: <<7>eth0: interrupt, status e401, latency 4
ticks.
May 3 17:29:36 es40-05 kernel: .
May 3 17:29:36 es40-05 kernel: <th0: interrupt, status e401, latency 3
ticks.
May 3 17:29:36 es40-05 kernel: <7
May 3 17:29:36 es40-05 kernel: <7t()
May 3 17:29:37 es40-05 kernel: <01, latency 4 ticks.
May 3 17:29:37 es40-05 kernel: <7
May 3 17:29:37 es40-05 kernel: <7
May 3 17:29:37 es40-05 kernel: th0: interrupt, status e401, latency 4
ticks.
May 3 17:29:37 es40-05 kernel: <7o send a packet, Tx index 5905.
May 3 17:29:37 es40-05 kernel: <7<7>eth0: exiting interrupt, status e000.
May 3 17:29:37 es40-05 kernel: e201.
May 3 17:29:37 es40-05 kernel: <7<7>eth0: In interrupt loop, status e401.
May 3 17:29:37 es40-05 kernel: <7omerang_start_xmit()
May 3 17:29:37 es40-05 kernel: <7omerang_start_xmit()

The next line is:
--------------------------
May 3 17:36:17 es40-05 syslogd 1.3-3: restart.

What could I do to be sure where the problem is ?

I tested the machine under high cpu load, memory, swap, combination of the
three.
The only thing that does not work under load is the network.... TCP/IP ?

Andrew Morton is pretty sure this has nothing to do with his driver...

Any ideas of how I could find where the problem is ?

Thx for any help.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon May 07 2001 - 21:00:17 EST