Re: IDE + SMP Lockup (no OOPS) in 2.2.12, 2.2.10

Rogier Wolff (R.E.Wolff@BitWizard.nl)
Thu, 30 Sep 1999 09:45:02 +0200 (MEST)


Jes Sorensen wrote:
> >>>>> "Rogier" == Rogier Wolff <R.E.Wolff@BitWizard.nl> writes:
>
> Rogier> Alan Cox wrote:
> >> > My question is: what is the right behavior for sharing an irq in
> >> a > situation like this? Is it considered not something you can
> >> do, or
>
> >> PCI IRQ sharing should be bulletproof. Anything else is a bug - be
> >> it firmware or software
>
> Rogier> Hardware, firmware or software.
>
> Rogier> (The BP6 has several features not commonly found anywhere else
> Rogier> (dual celeron, quad IDE ports), so to me it is entirely
> Rogier> possible that there is a hardware bug somewhere.)
>
> If the interrupts get delivered then it is most likely a driver bug. A
> hardware bug would either give you bogus interrupts (which the driver
> is supposed to check for by reading the interrupt status register of
> the device, before doing _any_ processing on the card) or you get no
> interrupts and you have a real problem.
>
> Most of the interrupt sharing problems I have seen have been due to
> driver writers forgetting to check whether an interrupt actually came
> from the device before they start processing it.

Right, and as I trust the IDE driver to do this right, I expect it
to work.

For example:

To get some extra speed out of a drive, some manufacturers give the
interrupt a few microseconds earlier than when the data is fully
available. With faster and faster hardware, it becomes possible to
empty the buffer before it is fully available.... Data corruption. Now
by sharing the interrupt, the OTHER drive may be interrupting, and the
processor gets that few microsecond lead....

When I was on the same room with the hardware guy, we had the rule:
"if it's untested it won't work". Well if Abit tested that board with
NT, its drive subsystem probably didn't get a workout comparable with
what it's getting from Linux. So: "It's untested and may not work".

I'm not saying that that's what is going on, but a hardware problem
like that could sure be possible.

Tell me. Is the Linux serial driver reliable? I'm seeing
datacorruption. That's for sure. Ok. There is a neato, new PCI serial
chip involved. The chip looks reliable to me. The driver looks
reliable to me. Where is the data getting corrupted? Anyway: Something
to do for today...

Roger.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
------ Microsoft SELLS you Windows, Linux GIVES you the whole house ------

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/