Re: another must-fix: major PS/2 mouse problem

From: Albert Cahalan (albert@users.sourceforge.net)
Date: Thu Jul 24 2003 - 20:46:41 EST


On Thu, 2003-07-24 at 13:30, Andrew Morton wrote:
> Albert Cahalan <albert@users.sourceforge.net> wrote:

> > Using the lockmeter on a 2.5.75 kernel, I discovered
> > that boomerang_interrupt() grabs a spinlock for over
> > 1/4 second. No joke, 253 ms. Interrupts are off AFAIK.
>
> boomerang_interrupt() doesn't disable interrupts. Is the NIC sharing the
> mouse's IRQ line?

No.

           CPU0
  0: 746770 XT-PIC timer
  1: 936 XT-PIC i8042
  2: 0 XT-PIC cascade
  4: 9 XT-PIC serial
  5: 0 XT-PIC uhci-hcd, uhci-hcd
 11: 2417 XT-PIC eth0
 12: 60 XT-PIC i8042
 14: 13844 XT-PIC ide0
 15: 2 XT-PIC ide1
NMI: 0
LOC: 751552
ERR: 0
MIS: 0

> boomerang_interrupt() is only used by nasty old NICs and yes, I guess it is
> possible that something has gone wrong and is causing occasional long spins
> in there.
>
> But I am more suspecting that you're not really using boomerang_interrupt()
> at all, and that something has gone wrong with lockmeter. What sort of NIC
> are you using?

I hope you don't consider a 100 Mb/s PCI device to be
a nasty old NIC. It's not an NE2000 you know! I have this:

00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
        Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
        Flags: bus master, medium devsel, latency 32, IRQ 11
        I/O ports at ec00 [size=128]
        Memory at df001000 (32-bit, non-prefetchable) [size=128]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

Without heavy net usage, boomerang_interrupt can take
as long as 1950 microseconds. That would be from mounting
an NFS filesystem and receiving broadcast packets.
I didn't have an opportunity to hit NFS hard today.

That's from rdtsc on a 1002-MHz Pentium III.

> Bear in mind that if some other device generates an interrupt while the CPU
> is running boomerang_interrupt(), lockmeter will count the time spent in
> that other device's interrupt as "time spent in boomerand_interrupt()".
> Which is very true, but it is not much help when one is trying to identify
> the source of the problem.

Do the Intel IRQ controller priority rules play a role here?

> Perhaps what you should do is to do an rdtsc on entry and exit of do_IRQ()
> and print stuff out when "long" periods of time in do_IRQ() are noticed.

I added code to the top and bottom of do_IRQ, as well as to
the top and bottom of boomerang_interrupt. The lockmeter was
compiled into the kernel but never enabled. I record the
minimum and maximum time in microseconds.

-------------------------------
IRQ num use min max
--- ------ -------- --- -------
  0 746770 timer 40 103595
  1 936 i8042 13 389773
  2 0 cascade - -
  3 - - - -
  4 9 serial 28 56
  5 0 uhci-hcd - -
  6 - - 711 711
  7 - - 25 25
  8 - - - -
  9 - - - -
 10 - - - -
 11 2417 eth0 87 1535331
 12 60 i8042 18 102895
 13 - - - -
 14 13844 ide0 8 51944
 15 2 ide1 7 11
NMI 0
LOC 751552
ERR 0
MIS 0
-------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jul 31 2003 - 22:00:24 EST