Re: 2.1.111: IDE DMA disabled?

Doug Ledford (dledford@dialnet.net)
Wed, 29 Jul 1998 19:14:11 -0500


Mark Lord wrote:
>
> Doug,
>
> Does this mean your system has an "IDE DMA corruption problem"?
>
> If so, then *please* tell me all of the gory details.
>
> Otherwise...

Had would be the operative word. I couldn't let it continue :) Quick run
down of what I went through. First, the machine is a single board computer
system with a passive backplane. The IDE was part of the SBC so any
complaints about PCI timings through the backplane don't exist. The drive
itself was a roughly 540MB drive and I can't even remember the exact make of
it now. The controller was 430HX based Triton controller. The scenario was
that we would get the occasional "bit in free list already cleared" or
whatever that message is. We would reboot every now and then to the
(mandatory due to the errors) fsck that would then occasionally find missing
files, cross linked files, whatever and fix them. We lived with this for
about 1 month of various reboots, fixes, late nights taking care of a system
that couldn't be down, etc. I then recompiled the kernel without the
Triton-DMA support (yes, this was a 2.0.x based system) and never again saw
this problem. The machine in question really only did two things. One, it
passed packets around between two 100Mbit/s ethernets and one 10Mbit/s
ethernet using three Tulip based cards. Two, it is a syslog server that
takes the logs from all of our machines, saves them off, nightly rotates the
various logs, compresses the older logs, and removes any logs older than 180
days. When our logs got too big to fit on that 500+ MB IDE drive, it was
replaced with a 3GB SCSI drive and a 7880P Adaptec controller that was also
built into the SBC. From the time that I disabled Triton-DMA until I
switched it to SCSI (in the range of 4 to 6 months), I never had another
corruption problem. With the aic7xxx driver and the Quantum hard disk I
have in there now using DMA I never have any problem. At the time I
disabled Triton-DMA to test the problem that was the *only* change I made to
the kernel and I didn't touch the startup scripts or anything else one bit.
The test was conclusive enough for me anyway that there could be problems
that on the remaining systems I've had that had IDE drives, I've always left
DMA disabled (well, this one experience here and the fact that the 2.0.x IDE
system tried to eat a WDC 1.6GB hard drive I had in another machine, which
was really WD's problem, but it still left me somewhat "concerned").

Now, having said all that, I'll grant that most people who use IDE can
safely use DMA. But there are those bad systems, bad cases, what ever. I
can also understand reluctance to turn down the performance by default for
these broken systems. I hated it when I finally disabled tagged queueing in
the aic7xxx driver by default. As a matter of fact, enabling tagged
queueing in the make *config process doesn't actually turn tagged queueing
on in my current driver. The only way to turn it on is by passing it setup
commands during boot (or insmod) time or by modifying the aic7xxx.c file. I
did this because even though I know the tagged queueing support in my driver
is *right* dammit, it doesn't change the fact that:

1) Iomega Jaz drives will silently corrupt data if you use tagged queueing
with them. They also report in the Inquiry data that they *can* do tagged
queueing, and will even perform the commands as normal, they simply
overwrite things when you do so.

2) Quantum drives are horrible about having firmware that will break when
pushed to the limit of their queue depth on a regular basis.

3) Micropolis drives should be called Micrapolis when it comes to their
ability to do the tagged queueing they also claim to support.

etc, etc. I think this is what Linus is pushing for. In my case, I did it
voluntarily because I was tired of chasing down problems caused by the
tagged queueing. The scsi.c blacklist didn't catch all of the bad devices,
and I knew that I personally couldn't catch them all either, so I defaulted
everything to off with manual intervention required to turn it back on.

-- 

Doug Ledford <dledford@dialnet.net> Opinions expressed are my own, but they should be everybody's.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html