Re: Slow disks.

From: Rogier Wolff
Date: Wed Dec 22 2010 - 05:43:17 EST



Unquoted text below is from either me or from my friend.


Someone suggested we try an older kernel as if kernel 2.6.32 would not
have this problem. We do NOT think it suddenly started with a certain
kernel version. I was just hoping to have you kernel-guys help with
prodding the kernel into revealing which component was screwing things
up....


On Mon, Dec 20, 2010 at 01:32:44PM -0500, Greg Freemyer wrote:
> On Mon, Dec 20, 2010 at 1:06 PM, Bruno Prémont
> <bonbons@xxxxxxxxxxxxxxxxx> wrote:
> > Hi,
> >
> > [ccing linux-ide]
> >
> > Please provide the part of kernel log showing initialization of your
> > disk controller(s) as well as detection of all the discs.


sata_sil 0000:03:01.0: version 2.4
sata_sil 0000:03:01.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
sata_sil 0000:03:01.0: Applying R_ERR on DMA activate FIS errata fix
scsi2 : sata_sil
scsi3 : sata_sil
scsi4 : sata_sil
scsi5 : sata_sil
ata3: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed200080 irq 24
ata4: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed2000c0 irq 24
ata5: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed200280 irq 24
ata6: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed2002c0 irq 24
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata3.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
ata3.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.00: configured for UDMA/100
scsi 2:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
usb 2-2: new low speed USB device using uhci_hcd and address 2
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata4.00: ATA-7: SAMSUNG HD103SI, 1AG01118, max UDMA7
ata4.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/100
scsi 3:0:0:0: Direct-Access ATA SAMSUNG HD103SI 1AG0 PQ: 0 ANSI: 5
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
ata5.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata5.00: configured for UDMA/100
scsi 4:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata6.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
ata6.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata6.00: configured for UDMA/100
scsi 5:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 3:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 3:0:0:0: [sdb] Write Protect is off
sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 4:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
sd 5:0:0:0: [sdd] Write Protect is off
sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 5:0:0:0: [sdd] Write Protect is off
sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sdb: sdb1 sdb2 sdb3 sdb4
sd 3:0:0:0: [sdb] Attached SCSI disk
sda: sda1 sda2 sda3 sda4
sd 2:0:0:0: [sda] Attached SCSI disk
sdc: sdc1 sdc2 sdc3 sdc4
sd 4:0:0:0: [sdc] Attached SCSI disk
sdd: sdd1 sdd2 sdd3 sdd4
sd 5:0:0:0: [sdd] Attached SCSI disk



> > Verbose lspci output for the disc controller and $(smartctl -i -A $disk)
> > output might be useful as well.


03:01.0 Mass storage controller: Silicon Image, Inc. SiI 3114
[SATALink/SATARaid] Serial ATA Controller (rev 02)
Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 24
Region 0: I/O ports at 4020 [size=8]
Region 1: I/O ports at 4014 [size=4]
Region 2: I/O ports at 4018 [size=8]
Region 3: I/O ports at 4010 [size=4]
Region 4: I/O ports at 4000 [size=16]
Region 5: Memory at ed200000 (32-bit, non-prefetchable) [size=1K]
[virtual] Expansion ROM at e8000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
Kernel driver in use: sata_sil
Kernel modules: sata_sil


But also tried onboard card:

00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
Controller (rev 01) (prog-if 8a [Master SecP PriP])
Subsystem: Super Micro Computer Inc Device 7980
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at 01f0 [size=8]
Region 1: I/O ports at 03f4 [size=1]
Region 2: I/O ports at 0170 [size=8]
Region 3: I/O ports at 0374 [size=1]
Region 4: I/O ports at 30a0 [size=16]
Kernel driver in use: ata_piix
Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic,
piix

smartctl output:
Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic,
piix

smartctl output:

smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format) family
Device Model: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV55759454
Firmware Version: 80.00A80
User Capacity: 1,000,204,886,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Tue Dec 21 20:06:00 2010 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 132 119 021 Pre-fail
Always - 6391
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 56
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age
Always - 7189
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 54
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
- 39
193 Load_Cycle_Count 0x0032 164 164 000 Old_age Always
- 109955
194 Temperature_Celsius 0x0022 109 107 000 Old_age Always
- 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
Offline - 0
- 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
Offline - 0

smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format) family
Device Model: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV55759454
Firmware Version: 80.00A80
User Capacity: 1,000,204,886,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Tue Dec 21 20:06:00 2010 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 132 119 021 Pre-fail
Always - 6391
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 56
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age
Always - 7189
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 54
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
- 39
193 Load_Cycle_Count 0x0032 164 164 000 Old_age Always
- 109955
194 Temperature_Celsius 0x0022 109 107 000 Old_age Always
- 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
Offline - 0

smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
Offline - 0


The others are very similar....


> >
> > Did you try the individual discs on a completely different system (e.g.
> > plain desktop system) and what revision of SATA are both components
> > supporting?

Yes I did. The disks were installed in a MSI/Core2DUO based desktop
system. No problems at all. Transfer rates up to 200MB/s.


The SIL 3114 chip is 1.5Gbps SATA. .


Searching for information on the WD drives I stumbled across:

http://community.wdc.com/t5/Other-Internal-Drives/1-TB-WD10EARS-desynch-issues-in-RAID/m-p/11559

Where it seems that WD simply says not to use these drives in a RAID.
I have experience with "Raid Edition" drives: They go bad at a MUCH
too high rate. If we can't use the non-raid for a RAID application, then
there is just ONE possible option: STAY AWAY FROM WESTERN DIGITAL:

Western digital claims it has the right to mess things up if you put a
non-raid drive in a raid configuration. Well fine. Then they can also
mess things up in normal situations because when Linux does software
raid there isn't any difference from RAID accesses.

(if you click through and read their entry in the knowledge base,
you'd notice that it should be more or less the other way
around. Linux will drop the RAID-enabled drive from the RAID within
seven seconds and reporting error on a sector, whereas the desktop
drive would remain operational until Linux times out (30 seconds?))



More hardware info:

System: Supermicro PDSMi, 4xDDR2 1GB, disks and controllers as above.
Current kernel version: 2.6.36.2
Problem was also present in kernel 2.6.33 (sorry cannot downgrade again.
This is a production system...)

uname -a:
Linux jcz.nl 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:32:37 CET 2010
x86_64 Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux

Disklayout:

major minor #blocks name

8 0 976762584 sda
8 1 240943 sda1
8 2 19535040 sda2
8 3 1951897 sda3
8 4 955032120 sda4
8 16 976762584 sdb
8 17 240943 sdb1
8 18 19535040 sdb2
8 19 1951897 sdb3
8 20 955032120 sdb4
8 32 976762584 sdc
8 33 240943 sdc1
8 34 19535040 sdc2
8 35 1951897 sdc3
8 36 955032120 sdc4
8 48 976762584 sdd
8 49 240943 sdd1
8 50 19535040 sdd2
8 51 1951897 sdd3
8 52 955032120 sdd4
9 127 240832 md127
9 1 39067648 md1
9 126 1910063104 md126
9 125 3903488 md125

MDstat:

Personalities : [raid1] [raid6] [raid5] [raid4]
md125 : active raid5 sdd3[5](S) sdb3[4] sda3[0] sdc3[3]
3903488 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

md126 : active raid5 sda4[0] sdd4[3] sdc4[5](S) sdb4[4]
1910063104 blocks super 1.1 level 5, 512k chunk, algorithm 2
[3/3] [UUU]

md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4]
39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
[3/3] [UUU]

md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4]
39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
[UUU]

md127 : active raid1 sdd1[3](S) sda1[0] sdb1[1] sdc1[2]
240832 blocks [3/3] [UUU]

unused devices: <none>
rootfs / rootfs rw 0 0
proc /proc proc rw,relatime 0 0
sys /sys sysfs rw,relatime 0 0
udev /dev devtmpfs
rw,nosuid,relatime,size=10240k,nr_inodes=506317,mode=755 0 0
/dev/disk/by-label/rootfs / ext4
rw,relatime,barrier=1,stripe=256,data=ordered 0 0
devpts /dev/pts devpts rw,relatime,mode=600,ptmxmode=000 0 0
shm /dev/shm tmpfs rw,nosuid,nodev,relatime 0 0
/dev/md127 /boot ext3
rw,relatime,errors=continue,barrier=0,data=writeback 0 0
/dev/md126 /data ext4 rw,relatime,barrier=1,data=ordered 0 0


Because of the severity of the problems (which remain after trying
another sata card), I have already bought a new Supermicro server. Let's
hope that helps.




--
** R.E.Wolff@xxxxxxxxxxxx ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/