Re: IDE interrupt masking and serial ports (was Re: Lockout during heavy disk I/O)

Theodore Y. Ts'o (tytso@mit.edu)
Tue, 17 Nov 1998 18:13:07 -0500


Date: Mon, 16 Nov 1998 16:11:22 -0500 (EST)
From: "Benjamin C.R. LaHaise" <blah@kvack.org>

I saw that you mentioned PPP, so I'll assume you're running it over the
serial ports. In this case, the quick & dirty 'fix' for you is to run
hdparm -u1 /dev/hda (and repeat for every IDE harddisk in the system).

On the serial side of things, it looks like the receive fifo threshold
gets set to 8, giving 607ms @115.2K or 1214ms @57.6K to clean out the
fifo. Folks, help me on this one, but doesn't it take all of 35ms to
transfer a 512 bytes over an ISA bus? Let's say I'm off by a factor of
three: the IDE driver is then masking interrupts for 6 sectors worth of
data?

Benjamin,

I think you're assuming that the problem is the IDE interrupt
masking. I normally run with the IDE interrupt settings in the default
position, and I have no trouble using the serial port --- and this is
with 2.1.128, so it's definitely not the case that all serial ports
suddenly stopped working between 2.1.127-pre2 and 2.1.128; both 2.1.127
and 2.1.128 work just fine for me, and there weren't any serial changes
made between the two.

I have seen some observations that at high speeds with 16450's,
the IDE masking will cause occasional serial characters to be dropped,
but most of the time people never notice it, because modern
communications protocols can handle this by retransmitting the packet.
So normally users just see a slight degredation in performance.

So I think there's something more complicated going on here. It
may very well be the IDE masking, but it's gotta be something a little
more complicated than that.

Now, I'll repeat a request made previously: would people with IDE HDs that
corrupt data when used with hdparm -u1 please step forward so we can start
making blacklists and further debugging?

It's not IDE HD's, as much as specific IDE controller chips, I do
believe. The problem is whether or not we can get good debugging
information at this late date. It's not clear that we have a big enough
sample of users using 2.1 and who would be willing to do the hdparm -u1
test to assure that we would be able to find all of the problem
hardwares in order to make the blacklist.

This is where we end up with the same problem which Microsoft has ---
with 7 million Linux users, many of whom are not technical hackers at
this point, if we make a change which causes people to lose data, they
won't know how to deal with it, and they will blame the OS instead of
their hardware. (I already get "bug reports" from users complaining
that ext2fs won't work, when the syslogs they forward me very clearly
indicate a hardware problem, so this isn't a hypothetical statement. If
we miss some hardware that should be on the blacklist and isn't, users
*will* blame the OS. It's not fair, but it's life.)

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/