LVD drive with ncr53c8xx SCSI driver causes crash and corruption of ext2fs under heavy load

Bradley M. Kuhn (bkuhn@ebb.org)
Wed, 17 Feb 1999 04:39:01 -0500


[1.] One line summary of the problem:

LVD drive with ncr53c8xx SCSI driver causes crash and corruption of ext2fs under heavy load

[2.] Full description of the problem/report:

I am having a problem with the ncr53c8xx SCSI driver in Linux 2.1.1
with a TekRam DC-390U2B/W card. That's the ncr53c895 chipset. I am using
this card with a Seagate Barracuda ST39173LW 8.7 GB drive.

The problem is reproducible.

If I install a new system, and compile and install 2.2.1, and then begin
using it, when the drive access gets intensive, the system will lock up.
The hard drive LED goes solid.

I use a soft or hard reset to reboot the machine, and inevitably the ext2fs
file systems are corrupted with many Bad blocks in various inodes.

I can force the behavior if I do the following:

find / -xdev -depth -print | cpio -o > /tmp/foo.cpio.gz &
sleep 10
find / -xdev -ls

And let that run, the ext2fs file system will eventually corrupt horribly,
and is mostly unrepairable with fsck. Most of the fsck output has to do
with duplicate blocks in inodes.

I can usually not get the system back up (since / is corrupted so badly),
but when I can, it has nothing of interest in any logs (of course, since
it's a drive problem, nothing can really be written to the logs).

As the README for the ncr53c8xx suggests, I have tried configuring the drive:
- only asynchronous data transfers
- tagged commands disabled
- disconnections not allowed

in the setup utility, but I am still able to reproduce this problem.

Can anyone give me any information about how I might solve this problem?
The system works fine under normal use, but heavy drive access will cause it
to crash the ext2fs horribly. Since I can't guarantee that the drive won't
be heavily used, this has called me to have to reinstall numerous times.

I have not ruled out this being a configuration problem, but I have checked
that I have followed every possible piece of documentation in the
README.ncr43c8xx, as well as my card's documentation, and I am unable to
stop this from happening.

The drive works *fine* under light usage, but cannot handle heavy usage.

[3.] Keywords (i.e., modules, networking, kernel):

ncr53c8xx, SCSI, LVD, Ultra2, Tekram, Seagate

[4.] Kernel version (from /proc/version):
Linux version 2.2.1 (root@localhost.localdomain) (gcc version 2.7.2.3) #2
Wed Feb 17 01:23:54 EST 1999

[7.] Environment

[7.1.] Software (add the output of the ver_linux script here)

Linux atheist 2.2.1 #2 Wed Feb 17 01:23:54 EST 1999 i586 unknown
Kernel modules 2.1.85
Gnu C 2.7.2.3
Binutils 2.9.1.0.4
Linux C Library 2.0.7
Dynamic linker ldd (GNU libc) 2.0.7
Linux C++ Library 2.8.0
Procps 1.2.7
Mount 2.7l
Net-tools 1.33
Kbd 0.94
Sh-utils 1.16

[7.2.] Processor information (from /proc/cpuinfo):
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 8
model name : AMD-K6(tm) 3D processor
stepping : 12
cpu MHz : 350.802387
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 sep pge mmx 3dnow
bogomips : 699.60

[7.3.] Module information (from /proc/modules):

(No modules)

[7.4.] SCSI information (from /proc/scsi/scsi)
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE Model: ST39173LW Rev: 6246
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: MICROP Model: 3243-19 1128RV Rev: 28RV
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: iomega Model: jaz 1GB Rev: H.72
Type: Direct-Access ANSI SCSI revision: 02

(The problem is with ID0. The other devices seem to function well, even
under heavy load)

[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

> cat /proc/scsi/ncr53c8xx/0
General information:
Chip NCR53C895, device id 0xc, revision id 0x1
IO port address 0xec00, IRQ number 11
Using memory mapped IO at virtual address 0xc8004000
Synchronous period factor 10, max commands per lun 32

-- 
      Bradley M. Kuhn   |     bkuhn@ebb.org    |   http://www.ebb.org/bkuhn

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/