SCSI crash in v2.0.29

Rob Hagopian (Rob.Hagopian@vuser.vu.union.edu)
Sat, 19 Apr 1997 19:24:04 -0400


Got this crash the other night when updating the FTP mirror
sites... The drive is a Quantum Atlas, the card is a 825A 1365 board. This
brought the whole system down. Somehow the system did not come back up
automagicly because it was unable to remount the drive. This was due to
problems initializing the drive/card in the scsi startup. A manual touch of
the restart button solved all the problems.
This sort of smacks of a hardware problem, but I'm not convinced.
The whole setup was running for weeks before w/no problems, and we've never
had a problem with either drive or controller. This is a stock kernel
2.0.29 on a P120 w/64M of RAM. There's a 3c509B-TP, a Dec 21140A, and 2
NCR825As (only the one Atlas though).
In any case, I think that this might have been better handled?
Here's the syslog output (All start w/'Apr 17 03:xx:xx virtual
kernel'):

: scsi1 : illegal instruction
: scsi1 : DCMD|DBC=0x4f000000, DNAD=0xfec1ec (virt 0x00fec1ec)
: DSA=0xfec1e8 (virt 0x00fec1e8)
: DSPS=0x0, TEMP=0x3fe07c0 (virt 0x03fe07c0), DMODE=0x20
: SXFER=0x0, SCNTL3=0x0
: BSY phase=MSGIN, 0 bytes in SCSI FIFO
: STEST0=0x77
: scsi1 : DSP 0x3fe07ec (virt 0x03fe07ec) ->
: 0x3fe07ec (virt 0x03fe07ec) : 0x4f000000 0x00000000 (virt 0x00000000)
: 0x3fe07f4 (virt 0x03fe07f4) : 0x80080000 0x03fe2218 (virt 0x03fe2218)
: 0x3fe07fc (virt 0x03fe07fc) : 0x60000200 0x00000000 (virt 0x00000000)
: 0x3fe0804 (virt 0x03fe0804) : 0x4300003c 0x03fe0da4 (virt 0x03fe0da4)
: 0x3fe080c (virt 0x03fe080c) : 0x8e0b0000 0x03fe0814 (virt 0x03fe0814)
: 0x3fe0814 (virt 0x03fe0814) : 0x1e000000 0x00000040 (virt 0x00000040)
: scsi1 : connected (SDID=0x0, SSID=0x80)
: scsi1 : dsa at phys 0xfec1e8 (virt 0x00fec1e8)
: + 64 : dsa_msgout length = 1, data = 0xfec02c (virt 0x00fec02c)
: Identify disconnect allowed lun 0
: + 60 : select_indirect = 0x13000800
: + 56 : dsa_cmnd = 0x3feba04 result = 0xffff,
target = 0, lun = 0, cmd = Write (10) 00 00 6a c0 67 00 00 02 00
: Unable to handle kernel paging request at virtual address cffec218
: current->tss.cr3 = 0098d000, (r3 = 0098d000
: *pde = 00000000
: Oops: 0000
: CPU: 0
: EIP: 0010:[print_dsa+385/572]
: EFLAGS: 00010016
: eax: 0f000030 ebx: 03feba04 ecx: 03fe0068 edx: 03c0000c
: esi: 00000000 edi: 00fec02d ebp: 00fec1e8 esp: 0090ee44
: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
: Process rpc.nfsd (pid: 302, process nr: 24, stackpage=0090e000)
: Stack: 00000000 00006200 f1003000 00000000 00000040 03fe0068 001b0f5e
03fe0018
: 00fec1e8 001d5fb0 001d785c 00000001 00000000 00000080 00000000
00000000
: 000001fc 03fe0018 00000008 00000001 4f000000 00fec1e8 03fe081c
03fe0068
: Call Trace: [print_lots+890/916] [intr_dma+1121/1512]
[NCR53c7x0_intr+1732/2400] [sys_socketcall+571/732] [do_fast_IRQ+42/76]
[fast_IRQ11_interrupt+65/112
]
: Code: 8b 54 95 00 52 50 68 46 75 1d 00 e8 8b 20 f6 ff 83 c4 0c 85
: Aiee, killing interrupt handler
: scsi : aborting command due to timeout : pid 228545, scsi1, channel 0, id
0, lun 0 Write (10) 00 00 6a c0 67 00 00 02 00
: scsi1 : DANGER : command running, can not abort.
: scsi : aborting command due to timeout : pid 228545, scsi1, channel 0, id
0, lun 0 Write (10) 00 00 6a c0 67 00 00 02 00
: scsi1 : DANGER : command running, can not abort.
: SCSI host 1 abort (pid 228545) timed out - resetting
: SCSI bus is being reset for host 1 channel 0.
: scsi1 : DCMD|DBC=0x4f000000, DNAD=0xfec1ec (virt 0x00fec1ec)
: DSA=0xfec1e8 (virt 0x00fec1e8)
: DSPS=0x0, TEMP=0x3fe07c0 (virt 0x03fe07c0), DMODE=0x20
: SXFER=0x0, SCNTL3=0x0
: BSY phase=MSGIN, 0 bytes in SCSI FIFO
: STEST0=0x77
: scsi1 : DSP 0x3fe07ec (virt 0x03fe07ec) ->
: 0x3fe07ec (virt 0x03fe07ec) : 0x4f000000 0x00000000 (virt 0x00000000)
: 0x3fe07f4 (virt 0x03fe07f4) : 0x80080000 0x03fe2218 (virt 0x03fe2218)
: 0x3fe07fc (virt 0x03fe07fc) : 0x60000200 0x00000000 (virt 0x00000000)
: 0x3fe0804 (virt 0x03fe0804) : 0x4300003c 0x03fe0da4 (virt 0x03fe0da4)
: 0x3fe080c (virt 0x03fe080c) : 0x8e0b0000 0x03fe0814 (virt 0x03fe0814)
: 0x3fe0814 (virt 0x03fe0814) : 0x1e000000 0x00000040 (virt 0x00000040)
: scsi1 : connected (SDID=0x0, SSID=0x80)
: scsi1 : dsa at phys 0xfec1e8 (virt 0x00fec1e8)
: + 64 : dsa_msgout length = 1, data = 0xfec02c (virt 0x00fec02c)
: Identify disconnect allowed lun 0
: + 60 : select_indirect = 0x13000800
: + 56 : dsa_cmnd = 0x3feba04 result = 0xffff,
target = 0, lun = 0, cmd = Write (10) 00 00 6a c0 67 00 00 02 00
: Unable to handle kernel paging request at virtual address cffec218
: current->tss.cr3 = 03fc9000, (r3 = 03fc9000
: *pde = 00000000
: Oops: 0000
: CPU: 0
: EIP: 0010:[print_dsa+385/572]
: EFLAGS: 00010016
: eax: 0f000030 ebx: 03feba04 ecx: 03fe0068 edx: 03c0000c
: esi: 00000000 edi: 00fec02d ebp: 00fec1e8 esp: 03fffe48
: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
: Process init (pid: 1, process nr: 1, stackpage=03fff000)
: Stack: 00000000 00006200 f1003000 00000000 00000040 03fe0068 001b0f5e
03fe0018
: 00fec1e8 001d5fb0 001d785c 00000001 00000000 00000080 00000000
03fe0018
: 03fe0018 03fe0068 00000008 00000001 4f000000 00fec1e8 03fe081c
03fe0068
: Call Trace: [print_lots+890/916] [NCR53c7xx_reset+34/304]
[scsi_reset+196/776] [scsi_times_out+115/304] [scsi_main_timeout+134/168]
[timer_bh+240/332] [do
_bottom_half+59/96]
: [handle_bottom_half+11/24] [sync_buffers+85/396] [fsync_dev+14/48]
[sys_sync+7/16] [system_call+85/124]
: Code: 8b 54 95 00 52 50 68 46 75 1d 00 e8 8b 20 f6 ff 83 c4 0c 85
: Aiee, killing interrupt handler
: SCSI host 1 channel 0 reset (pid 228545) timed out - trying harder
: SCSI bus is being reset for host 1 channel 0.
: scsi1 : DCMD|DBC=0x4f000000, DNAD=0xfec1ec (virt 0x00fec1ec)
: DSA=0xfec1e8 (virt 0x00fec1e8)
: DSPS=0x0, TEMP=0x3fe07c0 (virt 0x03fe07c0), DMODE=0x20
: SXFER=0x0, SCNTL3=0x0
: BSY phase=MSGIN, 0 bytes in SCSI FIFO
: STEST0=0x77
: scsi1 : DSP 0x3fe07ec (virt 0x03fe07ec) ->
: 0x3fe07ec (virt 0x03fe07ec) : 0x4f000000 0x00000000 (virt 0x00000000)
: 0x3fe07f4 (virt 0x03fe07f4) : 0x80080000 0x03fe2218 (virt 0x03fe2218)
: 0x3fe07fc (virt 0x03fe07fc) : 0x60000200 0x00000000 (virt 0x00000000)
: 0x3fe0804 (virt 0x03fe0804) : 0x4300003c 0x03fe0da4 (virt 0x03fe0da4)
: 0x3fe080c (virt 0x03fe080c) : 0x8e0b0000 0x03fe0814 (virt 0x03fe0814)
: 0x3fe0814 (virt 0x03fe0814) : 0x1e000000 0x00000040 (virt 0x00000040)
: scsi1 : connected (SDID=0x0, SSID=0x80)
: scsi1 : dsa at phys 0xfec1e8 (virt 0x00fec1e8)
: + 64 : dsa_msgout length = 1, data = 0xfec02c (virt 0x00fec02c)
: Identify disconnect allowed lun 0
: + 60 : select_indirect = 0x13000800
: + 56 : dsa_cmnd = 0x3feba04 result = 0xffff,
target = 0, lun = 0, cmd = Write (10) 00 00 6a c0 67 00 00 02 00
: Unable to handle kernel paging request at virtual address cffec218
: current->tss.cr3 = 022bf000, (r3 = 022bf000
: *pde = 00000000
: Oops: 0000
: CPU: 0
: EIP: 0010:[print_dsa+385/572]
: EFLAGS: 00010016
: eax: 0f000030 ebx: 03feba04 ecx: 03fe0068 edx: 03c0000c
: esi: 00000000 edi: 00fec02d ebp: 00fec1e8 esp: 019f6bcc
: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
: Process in.ftpd (pid: 28867, process nr: 92, stackpage=019f6000)
: Stack: 00000000 00006200 f1003000 00000000 00000040 03fe0068 001b0f5e
03fe0018
: 00fec1e8 001d5fb0 001d785c 00000001 00000000 00000080 00000000
03fe0018
: 03fe0018 03fe0068 00000008 00000001 4f000000 00fec1e8 03fe081c
03fe0068
: Call Trace: [print_lots+890/916] [NCR53c7xx_reset+34/304]
[scsi_reset+196/776] [scsi_times_out+191/304] [scsi_main_timeout+134/168]
[timer_bh+240/332] [do
_bottom_half+59/96]
: [handle_bottom_half+11/24] [scsi_do_cmd+768/940]
[requeue_sd_request+3458/3472] [rw_intr+0/1212] [def_callback2+38/44]
[udp_rcv+1176/1196] [tulip:t
ulip_probe+6396/12308] [tulip:tulip_probe+-8194/12308]
: [ip_rcv+967/1156] [tulip:tulip_probe+-8116/12308]
[allocate_device+710/800] [do_sd_request+358/376] [add_request+507/516]
[make_request+1007/1020]
[ll_rw_block+349/468] [brw_page+638/860]
: [generic_readpage+115/128] [output_byte+36/240]
[try_to_read_ahead+37/256] [try_to_read_ahead+240/256]
[generic_file_read+680/1460] [output_byte+36
/240] [floppy_hardint+144/176] [fat_read_super+252/1236]
: [sys_read+138/176] [system_call+85/124]
: Code: 8b 54 95 00 52 50 68 46 75 1d 00 e8 8b 20 f6 ff 83 c4 0c 85
: Aiee, killing interrupt handler
: SCSI host 1 reset (pid 228545) timed out again -
: probably an unrecoverable SCSI bus or device hang.
: Unable to handle kernel paging request at virtual address c4000000
: current->tss.cr3 = 03f78000, (r3 = 03f78000
: *pde = 00000000
: Oops: 0000
: CPU: 0
: EIP: 0010:[NCR53c8xx_dsa_fixup+102/1328]
: EFLAGS: 00010016
: eax: 0f000168 ebx: 03c0005a ecx: 03bf81da edx: 03fe0600
: esi: 04000000 edi: 0100fa80 ebp: 00ff0018 esp: 01181d20
: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
: Process update (pid: 496, process nr: 57, stackpage=01181000)
: Stack: 03fe0068 03feb60c 00ff01e8 00ff0018 001c8bf7 0192f247 0192f2a4
0014fef0
: 0090c098 0192f2a4 00000000 0192f237 0192f2a4 001e1654 04860050
00151700
: 0090c098 0192f2a4 0192f2a4 001e1654 00000046 03fe0068 03feb60c
03feb60c
: Call Trace: [ipxitf_rcv+248/256] [tulip:tulip_probe+-8116/12308]
[ipx_rcv+148/156] [create_cmd+602/3532] [do_bottom_half+59/96]
[handle_bottom_half+11/24]
[NCR53c7xx_queue_command+385/500]
: [scsi_do_cmd+865/940] [scsi_done+0/1672]
[requeue_sd_request+3458/3472] [rw_intr+0/1212] [ide_set_handler+38/44]
[triton_dmaproc+259/320] [dma_intr
+0/176] [def_callback2+38/44]
: [udp_rcv+1176/1196] [tulip:tulip_probe+6140/12308]
[allocate_device+710/800] [do_sd_request+358/376] [add_request+507/516]
[make_request+1007/1020]
[ll_rw_block+349/468] [sync_old_buffers+177/296]
: [sys_bdflush+53/164] [system_call+85/124]
: Code: f3 a5 a8 02 74 02 66 a5 a8 01 74 01 a4 31 db 8d 76 00 8b 14
: general protection: 0000
: CPU: 0
: EIP: 0010:[get_unused_buffer_head+62/80]
: EFLAGS: 00010202
: eax: 00000051 ebx: 00000051 ecx: 00000000 edx: 53f000fe
: esi: 00000000 edi: 00000400 ebp: 01910000 esp: 03040d70
: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
: Process mirror (pid: 30307, process nr: 90, stackpage=03040000)
: Stack: 00000051 001246f1 00000001 00000400 01910000 03040df4 00124d0a
00124d23
: 01910000 00000400 00000000 00000001 00200801 00123517 00000000
00000400
: 00000000 00000001 00200801 0015a828 0000f000 00267b30 03040ef4
037819a8
: Call Trace: [create_buffers+33/144] [grow_buffers+70/244]
[grow_buffers+95/244] [refill_freelist+95/1360] [ext2_new_block+2248/2272]
[do_bottom_half+59/96
] [getblk+58/1128]
: [getblk+454/1128] [ext2_alloc_block+128/412]
[block_getblk+348/608] [ext2_getblk+385/556] [ext2_file_write+389/1116]
[inet_recvmsg+114/136] [sock_r
ead+171/192] [sys_write+271/328]
: [system_call+85/124]
: Code: 8b 42 18 a3 5c 0e 1e 00 89 d0 5b c3 8d 36 31 c0 5b c3 83 ec
: general protection: 0000
: CPU: 0
: EIP: 0010:[get_unused_buffer_head+62/80]
: EFLAGS: 00010202
: eax: 00001000 ebx: 00000000 ecx: 02680000 edx: 53f000fe
: esi: 00000000 edi: 00001000 ebp: 02680000 esp: 0302fe90
: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
: Process mm (pid: 30304, process nr: 85, stackpage=0302f000)
: Stack: 00000000 001246f1 0302ff34 00000000 00000000 02c531f0 01d28998
001247bb
: 02680000 00001000 0302ff34 00000001 00000000 02c531f0 031f4ef0
02680000
: 034ee62c 031f4ef0 031f0900 031f4c2c 0015dec4 03a25ba0 0015deee
0302ff30
: Call Trace: [create_buffers+33/144] [brw_page+91/860]
[ext2_lookup+180/368] [ext2_lookup+222/368] [generic_readpage+115/128]
[generic_file_read+1038/1460]
[generic_file_read+1250/1460]
: [sys_read+138/176] [system_call+85/124]