Re: 2.1.125 oops in ip_queue_xmit...

Tomasz Przygoda (tprzyg@securities.com)
Mon, 30 Nov 1998 17:49:48 -0500


O.K. seems that I will apply the "little" patch that Dave incorporated into
his source tree. Should I still fiddle with the other patches? I mean the
only thing I can see to be changing anything from those "other" patches is
synchronize_bh and memset with that dst stuff. The rest looks purely (as
mentioned by Alexey) like checkpoints only.

BTW: Answers for Alexey's questions:

Q: what gcc did you use to compile kernel?
A: 2.7.2.3-8 (RH)

Q: did you see something strabge in logs before crash?
A: The "cool" thing about these crashes (I had more of them before) that they
barely left _any_ traces - nothing in the logfiles, soomethings nothing on
the screen - this was the second time I had something resonable on the screen
(the first time I assumed that I don't have to writ it down cause it will
show up in the logs - wrong assumption :( - so this time I had a pencil and
paper ready :)

Q: did you ever see this without raid?
A: I've never seen, but mainly because I never tried - it's a fairly busy
server :)

Anyway thanks a lot for the bugfix - I'm just recompiling the kernel!

Alexey Kuznetsov wrote:

> Hello!
>
> > >>EIP: c0154208 <ip_queue_xmit+27c/344>
> > Trace: c015a9b1 <tcp_transmit_skb+3b1/3bc>
> > Trace: c015be49 <tcp_send_ack+89/90>
> > Trace: c015c298 <tcp_delack_timer+0/28>
>
> Do I hallucinate or I am sane yet?
> Seems, tcp_delack_timer() is called on locked socket.
>
> Tomasz, I think appended patch will solve your problem.
>
> I am not sure, that it is finally correct, though.
> Dave,Andi, Andrey, please, check.
>
> I also suspect, that Ted Deppner's report (equally obscure) is also closed
> by this fix.
>
> Alexey
>
> --- linux/net/ipv4/tcp_timer.c.orig Mon Nov 30 19:04:08 1998
> +++ linux/net/ipv4/tcp_timer.c Mon Nov 30 19:10:24 1998
> @@ -170,8 +170,12 @@
>
> if(!sk->zapped &&
> sk->tp_pinfo.af_tcp.delayed_acks &&
> - sk->state != TCP_CLOSE)
> - tcp_send_ack(sk);
> + sk->state != TCP_CLOSE) {
> + if (atomic_read(&sk->sock_readers) == 0)
> + tcp_send_ack(sk);
> + else
> + tcp_send_delayed_ack(&(sk->tp_pinfo.af_tcp),
> HZ/10);
> + }
> }
>
> void tcp_probe_timer(unsigned long data)
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

and

Alexey Kuznetsov wrote:

> Hello!
>
> > I have "mystical" crashes with oops in the networking code.
> > It's pretty reproducible (it just takes a time (between few days and few
> > weeks) and a lot of network activity.
>
> OK. Let me to ask couple of questions:
> - what gcc did you use to compile kernel?
> - did you see something strabge in logs before crash?
> - did you ever see this without raid?
>
> The bug looks pretty difficult. Could you apply the following
> patch and then wait for crash?
>
> This patch adds only one bug fix (which is unlikely to bite you),
> but adds couple of checks.
>
> Alexey Kuznetsov
>
> diff -ur linux-orig/include/net/dst.h linux/include/net/dst.h
> --- linux-orig/include/net/dst.h Wed Aug 26 21:14:55 1998
> +++ linux/include/net/dst.h Sun Nov 29 22:54:42 1998
> @@ -94,8 +94,11 @@
> extern __inline__
> void dst_release(struct dst_entry * dst)
> {
> - if (dst)
> + if (dst) {
> atomic_dec(&dst->use);
> + if (atomic_read(&dst->use) < 0)
> + printk("!!! Underflow %p\n", dst);
> + }
> }
>
> extern __inline__
> diff -ur linux-orig/net/core/dst.c linux/net/core/dst.c
> --- linux-orig/net/core/dst.c Sat Mar 21 20:58:08 1998
> +++ linux/net/core/dst.c Sun Nov 29 22:54:42 1998
> @@ -141,5 +141,6 @@
> if (dst->ops->destroy)
> dst->ops->destroy(dst);
> atomic_dec(&dst_total);
> + memset(dst, 0xB2, sizeof(dst));
> kfree(dst);
> }
> diff -ur linux-orig/net/ipv4/route.c linux/net/ipv4/route.c
> --- linux-orig/net/ipv4/route.c Sat Oct 3 20:17:44 1998
> +++ linux/net/ipv4/route.c Sun Nov 29 22:54:42 1998
> @@ -305,6 +305,8 @@
> if ((rth = xchg(&rt_hash_table[i], NULL)) == NULL)
> continue;
>
> + synchronize_bh();
> +
> for (; rth; rth=next) {
> next = rth->u.rt_next;
> rth->u.rt_next = NULL;
> @@ -567,7 +569,8 @@
> static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
> {
> struct rtable *rt = (struct rtable*)dst;
> -
> +
> +#if 0
> if (rt != NULL) {
> if (dst->obsolete) {
> ip_rt_put(rt);
> @@ -592,6 +595,7 @@
> return NULL;
> }
> }
> +#endif
> return dst;
> }
>

-- Tomek,
"In theory there's no difference between theory and practice, but in practice...."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/