Bug Report: 2.4.22 3ware hang on dying drive

From: Stephan von Krawczynski
Date: Sun Aug 31 2003 - 16:36:35 EST


Hello,

I just experienced this freeze during a verify cycle of a 3ware raid 5 config with 3 drives.
Entries in logfiles:

3w-xxxx: scsi2: AEN: INFO: Verify started: Unit #0.
3w-xxxx: scsi2: Unit #0: Command (f72d7400) timed out, resetting card.

And then:

ksymoops 2.4.8 on i686 2.4.22. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.22/ (default)
-m /boot/System.map-2.4.22 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

NMI Watchdog detected LOCKUP on CPU1, eip c02b3cd6, registers:
CPU: 1
EIP: 0010:[<c02b3cd6>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000002
eax: 00466307 ebx: 006ab97c ecx: 1152d298 edx: 00029ab9
esi: 00009004 edi: 00042000 ebp: f735ded4 esp: f735dec0
ds: 0018 es: 0018 ss: 0018
Process scsi_eh_2 (pid: 69, stackpage=f735d000)
Stack: 00000000 f88dd2db 006ab97c 130055b2 00000000 3f522cb5 00064f95 3f522cb3
0008c524 f735407c 00000000 00009008 00000000 f88da29b f735407c 00042000
0000001e 00155810 00155810 00000000 00000000 0000900c 00000000 f735407c
Call Trace: [<f88dd2db>] [<f88da29b>] [<f88dd673>] [<f88dd543>] [<f88dd8e5>]
[<c0213cef>] [<c021438e>] [<f88e3220>] [<c0214b66>] [<f88e3220>] [<c010592e>]
[<c0214a50>]
Code: 39 d8 72 f6 5b c3 8d 74 26 00 8b 44 24 04 eb 0a 8d 76 00 8d


>>EIP; c02b3cd6 <__rdtsc_delay+16/20> <=====

>>ebp; f735ded4 <_end+36f96874/38512a00>
>>esp; f735dec0 <_end+36f96860/38512a00>

Trace; f88dd2db <[3w-xxxx]tw_poll_status+6b/d0>
Trace; f88da29b <[3w-xxxx]tw_aen_drain_queue+4b/370>
Trace; f88dd673 <[3w-xxxx]tw_reset_sequence+23/e0>
Trace; f88dd543 <[3w-xxxx]tw_reset_device_extension+13/120>
Trace; f88dd8e5 <[3w-xxxx]tw_scsi_eh_abort+f5/1e0>
Trace; c0213cef <scsi_try_to_abort_command+4f/60>
Trace; c021438e <scsi_unjam_host+18e/850>
Trace; f88e3220 <[3w-xxxx]driver_template+0/6b>
Trace; c0214b66 <scsi_error_handler+116/180>
Trace; f88e3220 <[3w-xxxx]driver_template+0/6b>
Trace; c010592e <arch_kernel_thread+2e/40>
Trace; c0214a50 <scsi_error_handler+0/180>

Code; c02b3cd6 <__rdtsc_delay+16/20>
00000000 <_EIP>:
Code; c02b3cd6 <__rdtsc_delay+16/20> <=====
0: 39 d8 cmp %ebx,%eax <=====
Code; c02b3cd8 <__rdtsc_delay+18/20>
2: 72 f6 jb fffffffa <_EIP+0xfffffffa>
Code; c02b3cda <__rdtsc_delay+1a/20>
4: 5b pop %ebx
Code; c02b3cdb <__rdtsc_delay+1b/20>
5: c3 ret
Code; c02b3cdc <__rdtsc_delay+1c/20>
6: 8d 74 26 00 lea 0x0(%esi,1),%esi
Code; c02b3ce0 <__loop_delay+0/30>
a: 8b 44 24 04 mov 0x4(%esp,1),%eax
Code; c02b3ce4 <__loop_delay+4/30>
e: eb 0a jmp 1a <_EIP+0x1a>
Code; c02b3ce6 <__loop_delay+6/30>
10: 8d 76 00 lea 0x0(%esi),%esi
Code; c02b3ce9 <__loop_delay+9/30>
13: 8d 00 lea (%eax),%eax


1 warning issued. Results may not be reliable.

After reboot drive 1 (from 0-2) was dead. Replacement and rebuild worked.
If you have further questions, feel free to ask.

Regards,
Stephan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/