Re: PROBLEM: 2.2.5 unstable on Dell PC, 2.0.36 is stable

Richard Black (rjb@dcs.gla.ac.uk)
Thu, 05 Aug 99 16:51:57 +0100


I think I may have managed to get a useful oops dump out of it on this
occasion; it looks like either a bug in the 3c59x driver, or a bug in
interrupt handling causing the networking code to be inappropriately
re-entered.

This is with my modified trap.c to print things sensibly and then I
add the screen-real-estate wasteing <[<[]> stuff after I type it in to
keep ksymoops happy.

I get the following:

easter:/grasp_tmp9/linux-2.2.10> ksymoops < ~/tmp/oops3.txt
ksymoops 0.7c on i686 2.2.10. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.2.10/ (default)
-m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

No modules in ksyms, skipping objects
current->tss.cr3 = 00101000, %cr3 = 00101000
Oops: 0002
Stack: [<00000000>] [<00000013>] [<c9b65004>] [<0000001f>]
Call Trace: [<d08823ff>] [<c0109e1d>] [<c010e96e>] [<c0109f8f>]
[<c0107b08>] [<c017ae18>] [<c016546f>] [<c01146db>] [<c01658d6>]
[<c0176113>] [<c0160769>] [<c01190b5>] [<c0109fa6>] [<c0107b08>]
[<c0106251>] [<c0106000>] [<c0106000>] [<c01001b1>]
Code: f0 ff 49 70 0f 94 44 24 24 8b 5c 24 34 66 83 c3 1c 80 7c 24
Using defaults from ksymoops -t elf32-i386 -a i386

Trace; d08823ff <END_OF_CODE+105f979b/????>
Trace; c0109e1d <handle_IRQ_event+55/88>
Trace; c010e96e <do_level_ioapic_IRQ+62/a0>
Trace; c0109f8f <do_IRQ+3b/5c>
Trace; c0107b08 <ret_from_intr+0/20>
Trace; c017ae18 <fn_hash_lookup+8c/d4>
Trace; c016546f <ip_route_input_slow+15f/4c4>
Trace; c01146db <printk+17f/18c>
Trace; c01658d6 <ip_route_input+102/128>
Trace; c0176113 <arp_rcv+157/32c>
Trace; c0160769 <net_bh+191/1f4>
Trace; c01190b5 <do_bottom_half+81/a0>
Trace; c0109fa6 <do_IRQ+52/5c>
Trace; c0107b08 <ret_from_intr+0/20>
Trace; c0106251 <cpu_idle+41/54>
Trace; c0106000 <get_options+0/74>
Trace; c0106000 <get_options+0/74>
Trace; c01001b1 <L6+0/2>
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: f0 ff 49 70 lock decl 0x70(%ecx)
Code; 00000004 Before first symbol
4: 0f 94 44 24 24 sete 0x24(%esp,1)
Code; 00000009 Before first symbol
9: 8b 5c 24 34 movl 0x34(%esp,1),%ebx
Code; 0000000d Before first symbol
d: 66 83 c3 1c addw $0x1c,%bx
Code; 00000011 Before first symbol
11: 80 7c 24 00 00 cmpb $0x0,0x0(%esp,1)

CPU: 0
EFLAGS: 00010202
Aieee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

1 warning issued. Results may not be reliable.

Now note that the very top level of the backtrace is in a module, and
that ksymoops can't cope with modules. But I know from /proc/modules
that the only module loaded is 3c59x.o and sure enough that "code"
sequence occurs precisely once in 3c59x.o

So what I did was to tediously reverse engineer the make system so I
could rebuild just 3c59x.c without changing anything else except to
add -g (the make system likes to fiddle with various header files
every time it runs) with:

$ gcc -D__KERNEL__ -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer
-D__SMP__ -pipe -fno-strength-reduce -m486 -malign-loops=2
-malign-jumps=2 -malign-functions=2 -DCPU=686 -DMODULE -DMODVERSIONS
-include /export/grasp_tmp9/linux-2.2.10/include/linux/modversions.h
-I/export/grasp_tmp9/linux-2.2.10/include -g -c 3c59x.c

and then I use objdump --disassemble --source --line-numbers to look
at the source around the bit where the code sequence where the crash
is:

/export/grasp_tmp9/linux-2.2.10/drivers/net/3c59x.c:1634
24d8: 8b 54 24 30 movl 0x30(%esp,1),%edx
24dc: f6 c6 01 testb $0x1,%dh
24df: 74 6f je 2550 <vortex_interrupt+0x398>
/export/grasp_tmp9/linux-2.2.10/drivers/net/3c59x.c:1635
24e1: 8b 4c 24 34 movl 0x34(%esp,1),%ecx
24e5: 83 c1 0c addl $0xc,%ecx
/export/grasp_tmp9/linux-2.2.10/include/asm/io.h:81
#define RETURN_TYPE unsigned char
__IN(b,"")
#undef RETURN_TYPE
#define RETURN_TYPE unsigned short
__IN(w,"")
24e8: 89 ca movl %ecx,%edx
24ea: 66 ed inw (%dx),%ax
/export/grasp_tmp9/linux-2.2.10/drivers/net/3c59x.c:1635
24ec: f6 c4 10 testb $0x10,%ah
24ef: 74 5f je 2550 <vortex_interrupt+0x398>
/export/grasp_tmp9/linux-2.2.10/include/asm/io.h:88
__IN(l,"")
#undef RETURN_TYPE

__OUT(b,"b",char)
__OUT(w,"w",short)
24f1: b8 00 10 00 00 movl $0x1000,%eax
24f6: 66 ef outw %ax,(%dx)
/export/grasp_tmp9/linux-2.2.10/include/linux/skbuff.h:175
return (list->next == (struct sk_buff *) list);
}

extern __inline__ void kfree_skb(struct sk_buff *skb)
{
24f8: 8b 54 24 38 movl 0x38(%esp,1),%edx
24fc: 8b 8a 34 04 00 00 movl 0x434(%edx),%ecx
/export/grasp_tmp9/linux-2.2.10/include/asm/atomic.h:69
static __inline__ int atomic_dec_and_test(volatile atomic_t *v)
{
unsigned char c;

__asm__ __volatile__(
2502: f0 ff 49 70 lock decl 0x70(%ecx)
2506: 0f 94 44 24 24 sete 0x24(%esp,1)
/export/grasp_tmp9/linux-2.2.10/include/linux/skbuff.h:176
}

extern __inline__ void kfree_skb(struct sk_buff *skb)
{
if (atomic_dec_and_test(&skb->users))
250b: 8b 5c 24 34 movl 0x34(%esp,1),%ebx
250f: 66 83 c3 1c addw $0x1c,%bx
2513: 80 7c 24 24 00 cmpb $0x0,0x24(%esp,1)

So it looks to me like 3c59x line 1637 did a DEV_FREE_SKB on a NULL
tx_skb causing a crash in the locked "users" decrement.

Is this detailed enough for you to look into?

Richard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/