Re: IDE + SMP Lockup (no OOPS) in 2.2.12, 2.2.10

Rogier Wolff (R.E.Wolff@BitWizard.nl)
Thu, 30 Sep 1999 11:55:39 +0200 (MEST)


Jes Sorensen wrote:
> >>>>> "Rogier" == Rogier Wolff <R.E.Wolff@BitWizard.nl> writes:
>
> Rogier> Jes Sorensen wrote:
> >> Most of the interrupt sharing problems I have seen have been due to
> >> driver writers forgetting to check whether an interrupt actually
> >> came from the device before they start processing it.
>
> Rogier> To get some extra speed out of a drive, some manufacturers
> Rogier> give the interrupt a few microseconds earlier than when the
> Rogier> data is fully available. With faster and faster hardware, it
> Rogier> becomes possible to empty the buffer before it is fully
> Rogier> available.... Data corruption. Now by sharing the interrupt,
> Rogier> the OTHER drive may be interrupting, and the processor gets
> Rogier> that few microsecond lead....
>
> Well it is supposedly the job of the driver to check that with the
> hardware that data is actually available before taking it for
> granted. This may for various reasons be problematic if the cheap
> designers did stupid things etc.

Right. If you really want to interrupt before the data is really
available, you end up marking "data ready" in het control register
too. It's easier that way. If you're cutting corners cut them well.

Oh, the reason that I'm pretty sure that the PCI interrupt sharing
for IDE works is that otherwise it wouldn't work at all. You always
have to check for data.

One way to optimize a driver is to say that IF the check for "did you
interrupt me" is just as long as "do you have data for me", then you
can forget the "did you interrupt me" and simply ask if it has any
data. I mean if it didn't interrupt but it does have data, why not
handle it anyway. Saves the extra interrupt overhead.

That's a trick that is completely according to the rules. But if the
NT driver happens not to do it that way, do you think it has been
tested? Anyway, Andre Hedrick said he was able to reproduce the
problem, and that means that the solution is in sight. It is very slow
debugging a driver with a client 1000 miles away saying: "no, it still
crashed".

> Rogier> Tell me. Is the Linux serial driver reliable? I'm seeing
> Rogier> datacorruption. That's for sure. Ok. There is a neato, new PCI
> Rogier> serial chip involved. The chip looks reliable to me. The
> Rogier> driver looks reliable to me. Where is the data getting
> Rogier> corrupted? Anyway: Something to do for today...
>
> Well serial is a medium that is almost guaranteed to lose data once in
> a while ;-)

Ok. Tell me: Why is the PCI serial chip with 128 byte buffer
experiencing overruns, while the chip soldered to my motherboard
(probably in one of those SMC multi-io chips (*)) with only 16 byte
buffer is NOT dropping characters?

What the hell is going on?

I'll be checking that the 128byte buffer is indeed enabled, and after
that I don't know what to do anymore....

Oh. I was suspecting the transmit path, but have now narrowed it down
to the recieve path...

Roger.

(*) It's effectively on the ISA bus!

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
------ Microsoft SELLS you Windows, Linux GIVES you the whole house ------

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/