ATA Write Error and Time-out Notification in User Space

From: John Treubig
Date: Tue Dec 20 2005 - 16:54:47 EST


Where would I look in the LibATA/SCSI chain to permit Write Error and Time-out notification to be passed back to user space without hanging the system?

BACKGROUND: We've implemented a disk drive testing system under 2.6 using LibATA to access the ATA devices. During testing, we attach additional "test" drives to the system and perform reads and writes to these drives. The system drive is not part of the test environment. During testing we expect to see errors reported (read, write and time-out) from the "test" drives. When a SCSI disks report errors, the SCSI handlers perform as expected, reporting the error and recovering. When ATA drives report errors, only read errors recover and we are able to capture the error. Write and time-out errors hang the system.

Hardware tested:
Promise Ultra133 TX2 (PDC20269 chip; pata_pdc2027x driver)

Kernel versions tested:
2.6.11
2.6.14 rc2
2.6.15 rc5

RECREATING THE PROBLEM: In our situation, we see the write and time-out failures when we screen drives beyond their commercial temperature limits for military applications using a custom application to read and write to the "test" drives. Rather than having to have a drive in a special temperature environment, the easiest way to simulate the failures is to unplug power to a test drive while it is under test. The resulting errors and time-outs will hang the system. Under this scenario, any type of read or write to the "test" drive will fail the same as with our application. I've attached copies of the system messages and startup-log to give further details into the hangs.

---
Best wishes,
John Treubig
VT Miltope
Senior Test Engineer
(334) 613-6495

write back
[ 115.889416] ata_scsi_dump_cdb: CDB (2:0,0,0) 00 00 00 00 00 00 00 00 24
[ 115.916707] ata_scsiop_noop: ENTER
[ 115.943552] ata_scsi_dump_cdb: CDB (2:0,0,0) 25 00 00 00 00 00 00 00 00
[ 115.970878] ata_scsiop_read_cap: ENTER
[ 115.997800] SCSI device sdc: 78138047 512-byte hdwr sectors (40007 MB)
[ 116.025414] ata_scsi_dump_cdb: CDB (2:0,0,0) 5a 00 08 00 00 00 00 00 08
[ 116.053090] ata_scsiop_mode_sense: ENTER
[ 116.080352] ata_scsi_dump_cdb: CDB (2:0,0,0) 5a 00 08 00 00 00 00 00 24
[ 116.108002] ata_scsiop_mode_sense: ENTER
[ 116.135220] SCSI device sdc: drive cache: write back
[ 116.162352] sdc:<3>ata_scsi_dump_cdb: CDB (2:0,0,0) 28 00 00 00 00 00 00 00 08
[ 116.189890] ata_scsi_translate: ENTER
[ 116.217022] scsi_10_lba_len: ten-byte command
[ 116.244166] ata_sg_setup: ENTER, ata2
[ 116.271090] ata_sg_setup: 1 sg elements mapped
[ 116.297906] ata_fill_sg: PRD[0] = (0xFD0A000, 0x1000)
[ 116.324556] ata_dev_select: ENTER, ata2: device 0, wait 1
[ 116.351065] ata_tf_load_mmio: feat 0x0 nsect 0x8 lba 0x0 0x0 0x0
[ 116.377858] ata_tf_load_mmio: device 0xE0
[ 116.404290] ata_exec_command_mmio: ata2: cmd 0xC8
[ 116.431008] ata_scsi_translate: EXIT
[ 116.835832] ata_host_intr: ata2: host_stat 0x4
[ 116.862205] ata_host_intr: ata2: protocol 4 (dev_stat 0x50)
[ 116.888452] ata_sg_clean: unmapping 1 sg elements
[ 116.914752] ata_qc_complete: EXIT
[ 116.940787] unknown partition table
[ 116.966767] sd 4:0:0:0: Attached scsi disk sdc
[ 116.994509] sr0: scsi-1 drive
[ 117.020498] Uniform CD-ROM driver Revision: 3.20
[ 117.046702] sr 2:0:0:0: Attached scsi CD-ROM sr0
[ 117.046756] sd 0:0:6:0: Attached scsi generic sg0 type 0
[ 117.073246] sr 2:0:0:0: Attached scsi generic sg1 type 5
[ 117.099473] sd 3:0:0:0: Attached scsi generic sg2 type 0
[ 117.125325] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 117.150525] Fusion MPT base driver 3.03.04
[ 117.175476] Copyright (c) 1999-2005 LSI Logic Corporation
[ 117.200672] Fusion MPT SPI Host driver 3.03.04
[ 117.225785] Fusion MPT FC Host driver 3.03.04
[ 117.250437] Fusion MPT SAS Host driver 3.03.04
[ 117.274695] Fusion MPT misc device (ioctl) driver 3.03.04
[ 117.298994] mptctl: Registered with Fusion MPT base driver
[ 117.323331] mptctl: /dev/mptctl @ (major,minor=10,220)
[ 117.348061] ACPI: PCI Interrupt Link [APCL] enabled at IRQ 22
[ 117.372545] ACPI: PCI Interrupt 0000:00:02.2[C] -> Link [APCL] -> GSI 22 (level, high) -> IRQ 20
[ 117.397705] PCI: Setting latency timer of device 0000:00:02.2 to 64
[ 117.397709] ehci_hcd 0000:00:02.2: EHCI Host Controller
[ 117.422670] ehci_hcd 0000:00:02.2: debug port 1
[ 117.447214] PCI: cache line size of 64 is not supported by device 0000:00:02.2
[ 117.447282] ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1
[ 117.472256] ehci_hcd 0000:00:02.2: irq 20, io mem 0xe0005000
[ 117.496766] ehci_hcd 0000:00:02.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
[ 117.521393] hub 1-0:1.0: USB hub found
[ 117.545622] hub 1-0:1.0: 6 ports detected
[ 117.670136] ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
[ 117.670519] ACPI: PCI Interrupt Link [APCF] enabled at IRQ 21
[ 117.694783] ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [APCF] -> GSI 21 (level, high) -> IRQ 21
[ 117.719857] PCI: Setting latency timer of device 0000:00:02.0 to 64
[ 117.719861] ohci_hcd 0000:00:02.0: OHCI Host Controller
[ 117.744608] ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
[ 117.769437] ohci_hcd 0000:00:02.0: irq 21, io mem 0xe0003000
[ 117.846898] hub 2-0:1.0: USB hub found
[ 117.871420] hub 2-0:1.0: 3 ports detected
[ 117.996976] ACPI: PCI Interrupt Link [APCG] enabled at IRQ 20
[ 118.021855] ACPI: PCI Interrupt 0000:00:02.1[B] -> Link [APCG] -> GSI 20 (level, high) -> IRQ 22
[ 118.047521] PCI: Setting latency timer of device 0000:00:02.1 to 64
[ 118.047524] ohci_hcd 0000:00:02.1: OHCI Host Controller
[ 118.072995] ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 3
[ 118.098933] ohci_hcd 0000:00:02.1: irq 22, io mem 0xe0004000
[ 118.177418] hub 3-0:1.0: USB hub found
[ 118.203101] hub 3-0:1.0: 3 ports detected
[ 118.329153] USB Universal Host Controller Interface driver v2.3
[ 118.355385] usbcore: registered new driver usblp
[ 118.381437] drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
[ 118.408307] Initializing USB Mass Storage driver...
[ 118.435370] usbcore: registered new driver usb-storage
[ 118.462560] USB Mass Storage support registered.
[ 118.489815] usbcore: registered new driver usbhid
[ 118.517024] drivers/usb/input/hid-core.c: v2.6:USB HID core driver
[ 118.544622] mice: PS/2 mouse device common for all mice
[ 118.573058] NET: Registered protocol family 2
[ 118.609747] IP route cache hash table entries: 4096 (order: 2, 16384 bytes)
[ 118.639384] TCP established hash table entries: 16384 (order: 4, 65536 bytes)
[ 118.669324] TCP bind hash table entries: 16384 (order: 4, 65536 bytes)
[ 118.699306] TCP: Hash tables configured (established 16384 bind 16384)
[ 118.729249] TCP reno registered
[ 118.759060] ip_conntrack version 2.4 (2047 buckets, 16376 max) - 212 bytes per conntrack
[ 118.798121] input: AT Translated Set 2 keyboard as /class/input/input0
[ 118.845373] ip_tables: (C) 2000-2002 Netfilter core team
[ 118.918268] ipt_recent v0.3.1: Stephen Frost <sfrost@xxxxxxxxxxx>. http://snowman.net/projects/ipt_recent/
[ 118.951157] arp_tables: (C) 2002 David S. Miller
[ 118.992131] TCP bic registered
[ 119.024260] NET: Registered protocol family 1
[ 119.056060] NET: Registered protocol family 17
[ 119.087602] Using IPI Shortcut mode
[ 119.341785] input: ImPS/2 Generic Wheel Mouse as /class/input/input1
[ 119.543784] VFS: Mounted root (ext2 filesystem) readonly.
[ 119.575808] Freeing unused kernel memory: 216k freed
[ 124.822183] Adding 1188800k swap on /dev/hda3. Priority:-1 extents:1 across:1188800k
[ 125.145728] radDIO: module license 'unspecified' taints kernel.
[ 125.152977] RAD-DIO driver for NuDAQ PCI-7224 V1.0
[ 125.152982] Developed for Miltope Corp. by Radical Systems, Inc.
[ 125.152984] http://www.radicalsystems.com
[ 125.152986] Copyright (c) 2004, Radical Systems, Inc.
[ 125.152988] All rights reserved.
[ 125.159787] Probing...
[ 125.159808] ACPI: PCI Interrupt 0000:01:06.0[A] -> Link [APC3] -> GSI 18 (level, high) -> IRQ 19
[ 125.159819] Found Device:
[ 125.159821] Name: 0000:01:06.0
[ 125.159823] InitFlag: 0x7248
[ 125.159824] BusNo: 0x1
[ 125.159826] DevFunc: 0x30
[ 125.159827] LCRBase: 0x7000
[ 125.159829] BaseAddr: 0x7400
[ 125.159830] IRQ No: 0x5
[ 125.218473] sata_promise 0000:01:0a.0: version 1.03
[ 125.218503] ACPI: PCI Interrupt 0000:01:0a.0[A] -> Link [APC3] -> GSI 18 (level, high) -> IRQ 19
[ 125.228953] ata_device_add: ENTER
[ 125.228955] ata_host_add: ENTER
[ 125.228996] ata_port_start: prd alloc, virt cf01a000, dma f01a000
[ 125.229001] ata5: SATA max UDMA/133 cmd 0xD086A200 ctl 0xD086A238 bmdma 0x0 irq 19
[ 125.229006] ata_host_add: ENTER
[ 125.229023] ata_port_start: prd alloc, virt cf195000, dma f195000
[ 125.229027] ata6: SATA max UDMA/133 cmd 0xD086A280 ctl 0xD086A2B8 bmdma 0x0 irq 19
[ 125.229032] ata_host_add: ENTER
[ 125.229049] ata_port_start: prd alloc, virt cf025000, dma f025000
[ 125.229053] ata7: SATA max UDMA/133 cmd 0xD086A300 ctl 0xD086A338 bmdma 0x0 irq 19
[ 125.229058] ata_host_add: ENTER
[ 125.229076] ata_port_start: prd alloc, virt cf21d000, dma f21d000
[ 125.229080] ata8: SATA max UDMA/133 cmd 0xD086A380 ctl 0xD086A3B8 bmdma 0x0 irq 19
[ 125.229087] ata_device_add: probe begin
[ 125.229089] ata_device_add: ata5: probe begin
[ 125.429660] ata5: no device found (phy stat 00000000)
[ 125.429663] ata_device_add: ata5: probe end
[ 125.429666] scsi7 : sata_promise
[ 125.436545] ata_device_add: ata6: probe begin
[ 125.637365] ata_bus_reset: ENTER, host 6, port 1
[ 125.637377] ata_bus_softreset: ata6: bus reset via SRST
[ 125.893001] ata_dev_classify: found ATA device by sig
[ 125.893022] ata_bus_reset: EXIT
[ 125.893024] ata_dev_identify: ENTER, host 6, dev 0
[ 125.893027] ata_dev_select: ENTER, ata6: device 0, wait 1
[ 125.893054] ata_dev_identify: do ATA identify
[ 125.893057] pdc_qc_prep: ENTER
[ 125.893059] ata_dev_select: ENTER, ata6: device 0, wait 1
[ 125.893110] ata_exec_command_mmio: ata6: cmd 0xEC
[ 125.895981] ata_pio_sector: data read
[ 125.896332] ata_qc_complete: EXIT
[ 125.896346] ata6: dev 0 cfg 49:2f00 82:346b 83:5b01 84:4003 85:3469 86:1801 87:4003 88:407f
[ 125.896350] ata_dump_id: 49==0x2f00 53==0x0007 63==0x0007 64==0x0003 75==0x0000
[ 125.896353] ata_dump_id: 80==0x007e 81==0x001b 82==0x346b 83==0x5b01 84==0x4003
[ 125.896356] ata_dump_id: 88==0x407f 93==0x604f
[ 125.896360] ata6: dev 0 ATA-6, max UDMA/133, 234441648 sectors: LBA
[ 125.896364] ata_dev_identify: EXIT, drv_stat = 0x50
[ 125.896366] ata6(0): applying bridge limits
[ 125.896369] ata_dev_identify: ENTER/EXIT (host 6, dev 1) -- nodev
[ 125.896373] ata_host_set_pio: base 0x8 xfer_mode 0xc mask 0x1f x 4
[ 125.896378] ata_dev_set_xfermode: set features - xfer mode
[ 125.896380] pdc_qc_prep: ENTER
[ 125.896383] pdc_packet_start: ENTER, ap cf6ca284
[ 125.896514] pdc_interrupt: ENTER
[ 125.896517] pdc_interrupt: port 0
[ 125.896519] pdc_interrupt: port 1
[ 125.896533] ata_qc_complete: EXIT
[ 125.896535] pdc_interrupt: port 2
[ 125.896536] pdc_interrupt: port 3
[ 125.896538] pdc_interrupt: EXIT
[ 125.896540] ata_dev_set_xfermode: EXIT
[ 125.896543] ata_dev_set_mode: idx=5 xfer_shift=0, xfer_mode=0x45, base=0x40, offset=5
[ 125.896546] ata6: dev 0 configured for UDMA/100
[ 125.896548] ata_device_add: ata6: probe end
[ 125.896550] scsi8 : sata_promise
[ 125.903442] ata_device_add: ata7: probe begin
[ 126.103668] ata7: no device found (phy stat 00000000)
[ 126.103671] ata_device_add: ata7: probe end
[ 126.103674] scsi9 : sata_promise
[ 126.110431] ata_device_add: ata8: probe begin
[ 126.310364] ata8: no device found (phy stat 00000000)
[ 126.310367] ata_device_add: ata8: probe end
[ 126.310370] scsi10 : sata_promise
[ 126.317094] ata_device_add: probe begin
[ 126.317175] ata_scsi_dump_cdb: CDB (6:0,0,0) 12 00 00 00 24 00 00 00 00
[ 126.317179] ata_scsiop_inq_std: ENTER
[ 126.317216] ata_scsi_dump_cdb: CDB (6:0,0,0) 12 00 00 00 60 00 00 00 00
[ 126.317219] ata_scsiop_inq_std: ENTER
[ 126.317233] Vendor: ATA Model: ST3120023AS Rev: 3.01
[ 126.317241] Type: Direct-Access ANSI SCSI revision: 05
[ 126.324097] ata_scsi_dump_cdb: CDB (6:0,0,0) 00 00 00 00 00 00 00 00 00
[ 126.324103] ata_scsiop_noop: ENTER
[ 126.324133] ata_scsi_dump_cdb: CDB (6:0,0,0) 25 00 00 00 00 00 00 00 00
[ 126.324136] ata_scsiop_read_cap: ENTER
[ 126.324146] SCSI device sdd: 234441648 512-byte hdwr sectors (120034 MB)
[ 126.324154] ata_scsi_dump_cdb: CDB (6:0,0,0) 5a 00 08 00 00 00 00 00 08
[ 126.324157] ata_scsiop_mode_sense: ENTER
[ 126.324168] ata_scsi_dump_cdb: CDB (6:0,0,0) 5a 00 08 00 00 00 00 00 24
[ 126.324171] ata_scsiop_mode_sense: ENTER
[ 126.324176] SCSI device sdd: drive cache: write back
[ 126.330966] ata_scsi_dump_cdb: CDB (6:0,0,0) 00 00 00 00 00 00 00 00 24
[ 126.330972] ata_scsiop_noop: ENTER
[ 126.331003] ata_scsi_dump_cdb: CDB (6:0,0,0) 25 00 00 00 00 00 00 00 00
[ 126.331006] ata_scsiop_read_cap: ENTER
[ 126.331016] SCSI device sdd: 234441648 512-byte hdwr sectors (120034 MB)
[ 126.331023] ata_scsi_dump_cdb: CDB (6:0,0,0) 5a 00 08 00 00 00 00 00 08
[ 126.331026] ata_scsiop_mode_sense: ENTER
[ 126.331037] ata_scsi_dump_cdb: CDB (6:0,0,0) 5a 00 08 00 00 00 00 00 24
[ 126.331040] ata_scsiop_mode_sense: ENTER
[ 126.331045] SCSI device sdd: drive cache: write back
[ 126.331048] sdd:<3>ata_scsi_dump_cdb: CDB (6:0,0,0) 28 00 00 00 00 00 00 00 08
[ 126.331079] ata_scsi_translate: ENTER
[ 126.331083] scsi_10_lba_len: ten-byte command
[ 126.331087] ata_sg_setup: ENTER, ata6
[ 126.331090] ata_sg_setup: 1 sg elements mapped
[ 126.331093] pdc_qc_prep: ENTER
[ 126.331096] ata_fill_sg: PRD[0] = (0xF74E000, 0x1000)
[ 126.331099] pdc_packet_start: ENTER, ap cf6ca284
[ 126.331104] ata_scsi_translate: EXIT
[ 126.352866] pdc_interrupt: ENTER
[ 126.352870] pdc_interrupt: port 0
[ 126.352872] pdc_interrupt: port 1
[ 126.352888] ata_sg_clean: unmapping 1 sg elements
[ 126.352890] ata_qc_complete: EXIT
[ 126.352892] pdc_interrupt: port 2
[ 126.352894] pdc_interrupt: port 3
[ 126.352895] pdc_interrupt: EXIT
[ 126.352906] unknown partition table
[ 126.352939] sd 8:0:0:0: Attached scsi disk sdd
[ 126.366494] sd 8:0:0:0: Attached scsi generic sg4 type 0
[ 126.366520] ata_device_add: EXIT, returning 4
[ 126.830077] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.830291] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.830583] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.830829] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.880736] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.880757] ata_scsi_dump_cdb: CDB (1:0,0,0) 12 01 80 00 60 00 00 00 24
[ 126.880780] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.880788] ata_scsi_dump_cdb: CDB (1:0,0,0) 12 00 00 00 60 00 00 00 24
[ 126.880791] ata_scsiop_inq_std: ENTER
[ 126.880826] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.880834] ata_scsi_dump_cdb: CDB (1:0,0,0) 12 01 83 00 60 00 00 00 24
[ 126.880857] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.880864] ata_scsi_dump_cdb: CDB (1:0,0,0) 12 01 80 00 60 00 00 00 24
[ 126.914346] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.914367] ata_scsi_dump_cdb: CDB (2:0,0,0) 12 01 80 00 60 00 00 00 24
[ 126.914391] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.914398] ata_scsi_dump_cdb: CDB (2:0,0,0) 12 00 00 00 60 00 00 00 24
[ 126.914401] ata_scsiop_inq_std: ENTER
[ 126.914435] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.914443] ata_scsi_dump_cdb: CDB (2:0,0,0) 12 01 83 00 60 00 00 00 24
[ 126.914486] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.914493] ata_scsi_dump_cdb: CDB (2:0,0,0) 12 01 80 00 60 00 00 00 24
[ 126.947093] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.947114] ata_scsi_dump_cdb: CDB (6:0,0,0) 12 01 80 00 60 00 00 00 24
[ 126.947137] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.947145] ata_scsi_dump_cdb: CDB (6:0,0,0) 12 00 00 00 60 00 00 00 24
[ 126.947148] ata_scsiop_inq_std: ENTER
[ 126.947182] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.947189] ata_scsi_dump_cdb: CDB (6:0,0,0) 12 01 83 00 60 00 00 00 24
[ 126.947213] program scsi_unique_id is using a deprecated SCSI ioctl, please convert it to SG_IO
[ 126.947221] ata_scsi_dump_cdb: CDB (6:0,0,0) 12 01 80 00 60 00 00 00 24
[ 127.560220] kjournald starting. Commit interval 5 seconds
[ 127.560297] EXT3 FS on hda1, internal journal
[ 127.560303] EXT3-fs: mounted filesystem with ordered data mode.
[ 139.382882] r8169: eth0: link up

Attachment: messages1.gz
Description: GNU Zip compressed data