Re: 2.1.125 - Problem. & Aic7xxx 5.1.0

Gerard Roudier (groudier@club-internet.fr)
Tue, 10 Nov 1998 21:31:15 +0100 (MET)


Doug,

On Mon, 9 Nov 1998, Doug Ledford wrote:

> Gerard Roudier wrote:

[ ... ]

> > Hello Doug,
> >
> > I just downloaded 2.1.125 and looked into our pre-2.2 kernel.
> > No need to tell I think it is perhaps broken in some driver place. ;-)
> >
> > If I have been able to understand correctly how the thing work:
> >
> > C code isr:
> > -----------
> > C0) loop until no interrupt condition:
> > C1) if command complete
> > C2) clear the commande complete flag from the isr
> > C3) proceed completed commands
> > etc ...
> >
> > Sequencer:
> > ----------
> > S1) push the complete SCB to the complete queue
> > S2) raise the command complete interrupt flag
> >
> > As you driver is supporting MMIO, C2 may be posted as a PCI write
> > transaction. If S1 is also a PCI memory write from the sequencer, it also
> > may be posted (I am not sure of that, but I am sure you will confirm or
> > infirm).
>
> It is specifically these types of things that the presence of the mb() macro
> at the end of every aic_outb() and aic_inb() call is intended to stop. But,

The mb() has nothing to do with PCI ordering rules (when you are using
memory mapped IOs).

> the mb() goes further and also solves things like weak ordering problems on
> K6 and the latest PII processors that might re-order certain instructions
> without some sort of lock. If the mb() macro doesn't do so, then someone
> please feel free to speak up, but that was my understanding when I was
> working with Linus to get the right code in there so these wouldn't be
> problems.

The mb() macros works at processor level and for example allows to flush
write buffers. The concern is PCI ordering rules and especially rules
that applies to posted memory writes that are handle by the bridge and not
the processor.

> > If S1 occurs after C0 and before C2 and both S1 and C2 are posted, then
> > your driver can miss the latest SCB completed by S1. If there is no other
> > pending commands, the driver may not recieve any further interrupt for
> > the latest completed SCB.
> >
> > I also noticed that you are using the following contruct for handling
> > interrupts:
> >
> > do
> > {
> > aic7xxx_isr(irq, dev_id, regs);
> > } while ( (aic_inb(p, INTSTAT) & INT_PEND) );
> >
> > I often noticed this in BSD stuff. If the purpose is not to lose PCI
> > interrupt, let me tell you that such a construct is, in my opinion, kind
> > of bad quality band-aid.
>
> This is the comment that made me let this letter set for a month. Gerard,
> if you seriously think I am so lame as to use such a construct as a
> "band-aid" to avoid loosing PCI interrupts, then we have little if anything
> else further to discuss. You and I are obviously in two different worlds.

It is also executed for PCI. It is useless and may somtimes add too much
latency for other devices that are sharing the IRQ with your driver.

> In any case, for the sake of others on the kernel list, let me elaborate on
> the code in question.
>
> I originally wrote that code without the do { } while loop. It was a

Nice.

> single shot run through the interrupt handler and then return. At the time,
> the FreeBSD driver *did* have a while loop for its interrupt handler, I
> simply chose not to follow it. It worked fine on PCI cards, but it would

I did the same almost immediately for the the ncr driver and kept with
this change but really fixes things by ensuring that the flag that is
tested by the C code for IO completion is the last data written from the
SCRIPTS and by performing a dummy read from the C code in order to not
lose completions. BTW, the new 896 driver has the SCRIPT arranged in a way
that avoid this dummy read.

> sometimes hang on VLB cards. Then I found why, it was hanging around the two
> sections of code where we enable REQINIT handling. The reason, when we
> enabled REQINITs then cleared the SEQINT, we missed the edge the first
> REQINIT would have caused, so VLB cards with their edge triggered interrupts
> would miss the interrupt. I corrected the problem, made note of it in my
> comments, then informed Justin Gibbs of it. He corrected it in his driver

Handling EISA/VLB the same way as PCI is not the way to go if PCI has to
be victimized, IMO.

> as well and removed his while loop. I later added the loop back as an
> optimization. The exact code area where I made comments about loosing the
> interrupt (search for AWAITING_MSG in handle_seqint()) is a prime example.
> By the time we have handled the seqint, we've already raised the REQINIT
> flag and are ready for another interrupt. I could simply return and let the
> interrupt that's pending in the core interrupt code re-call my interrupt
> handler, or I could take care of as many REQINIT interrupts as possible
> before returning. I don't know about the NCR hardware, but the Adaptec
> hardware *is* fast enough that we can end up with immediate re-enter
> interrupt situations.

I will try to understand your explanation from your driver source when
I will have time for.
IMO, it is not a question to be fast or not to be fast, when IRQ are
shared, some event may happen at the time your IRQ handler is called for
an interrupt condition of another device. Even being slow is not an
option. BTW, when the ncr driver has to handle less INTFLYs than the
number of completed IOs, that means that the NCR has been faster for
interrupts that the CPU could handle them, and this happens sometimes.

> > On the other hand, you seem to import lots of BSD
> > material into Linux kernel. I did so years ago. I think that Linux has
> > most subtlenesses in various places than BSD stuff, even if it may look
> > sometimes a bit more broken. If would suggest you to be a bit more
> > selective in BSD stuff importation into the Linux kernel.
>
> Some times your english is broken enough that's it's hard to understand
> exactly what you mean. This is one of those cases. However, two things are

I was refering to the BSD QUEUE handling stuff and it seems it has
disappeared now.
BTW, having a excellent english is too late for me and I know lots of guys
that can do good software without that. You just need to be able to
understand documentations that most of the time are written in english
language.

> clear here. One, that you think I import too much FreeBSD stuff, and two, a
> subtle indication of what you think about my experience/abilities
> (specifically, your tone is patronizing). I import the code that deals
> directly with the sequencer and the sequencer itself. I write the rest of
> that code myself. I may look at FreeBSD, but anyone who thinks I copy

I know you write the C code yourself and importing the sequencer is a good
thing if it is really well developped and it seems it is. I donnot discuss
that point at all.

> FreeBSD can look at things where the code isn't dictated by hardware
> ordering requirements and see I don't. Simply check the init code, the
> reset code, the queue code, etc. In all of those areas it becomes obvious
> that there actually is little code shared between FreeBSD and linux. Now,
> having said that, I'm going to ignore the rest of your comment as uneducated
> guessing.

My mail was indeed to harsh and I apologize to you for that.
I have been nervous to see that the QUEUE stuff of FreeBSD was in the
kernel. Not because I dislike it, but because it is Linus to decide on
this point and I thought you just shoe-horned the thing in your patch.

But I stick with my remarks on the technical issues regarding ordering and
you driver (and perhaps Justin's one too, but I haven't time to look into
that one).

Regards,
Gerard.

PS: If I am able to understand your technical explanations from your
source code, I will reply to you on that point later.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/