Re: I really hate to do this...

From: David Nash (dnash@chaos.demon.co.uk)
Date: Mon Jun 26 2000 - 17:36:11 EST


On Sun, 25 Jun 2000, Nasa wrote:

> Hi agian,
>
> I have written to this group a couple of time concerning my system do a lot of strange
> crashing. The replies I have gotten back have been helpful and have tried to point the blame
> at either XFree86 4.0 or maybe a hardware problem. Both of these seemed like likely causes
> and because I could not identify anything specific as to when or what would cause a crash, I let it
> be and continued to watch these crashes to see if anything turned up. In addition I ran quite a few
> diagositc test on the hardware (using harddrive manufactor testing software, wintune98, passmark-burnin
> test, etc. And could not find any problems with the hardware -- well, that's not totally true, but I will get
> there).
>
> What has prompted me to write again (and hopefully not be thought to much of a fool) is that I have seen a
> series of kernel panic's of which I actually saw some output (most of the time the system just locks-up and I don't get anything). In three diffrernt cases I got the following error prior to
a kernel panic
>
> mapaddr 0x74c881d4 not valid at usb-ohci.c:1397

Hello,

  I think you are seeing the same problem with usb-ohci as I have been
having with my Athlon / MSI K7 Pro system. It will oops and panic
after 24-48 hours of uptime, normally when the machine is idle. If
the usb-ohci module is not loaded my system runs without problems.

Symptom Summary:

Unable to handle kernel paging request at virtual address f4d28af4
Process swapper (pid: 0, stackpage=c0259000)
EIP [usb-ohci]rh_send_irq+c5/190
Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!

I've been capturing the dump with a Psion 3 attached as a serial
console. I've also seen a useable dump on a text console but X just
freezes. Nothing is written to the logs so you have to capture the
data externally.

I've discussed this with David Brownell and Roman Weissgaerbe on the
linux-usb mailing list where there have been a few reports of this
sort of error in different conditions. David is working on a new
patch (which I'm currently running) to fix up this and some cardbus
related OHCI problems, in the meantime I've included a small patch
which contains the critical code that fixes the oops for me.

With the patch applied I see a single message logged and the system
continues to run, usb still working fine. The additional readl lines
are Roman's suggestion to see if there is just a single wrong readl, I
haven't hit the error with extra readl's yet so additional information
is welcome.

   David

--- linux-test2-pre11/drivers/usb/usb-ohci.c Fri Jun 23 10:54:41 2000
+++ linux/drivers/usb/usb-ohci.c Mon Jun 26 23:00:00 2000
@@ -1392,6 +1392,13 @@
        __u8 data[8];
 
        num_ports = readl (&ohci->regs->roothub.a) & RH_A_NDP;
+ if (num_ports > MAX_ROOT_PORTS) {
+ err ("bogus roothub registers for OHCI %s", ohci->ohci_dev->slot_name);
+ err ("num_ports: %d", num_ports);
+ err ("num_ports 2nd try: %d", readl (&ohci->regs->roothub.a) & RH_A_NDP);
+ err ("num_ports 3rd try: %d", readl (&ohci->regs->roothub.a) & RH_A_NDP);
+ return 0;
+ }
        *(__u8 *) data = (readl (&ohci->regs->roothub.status) & (RH_HS_LPSC | RH_HS_OCIC))
                ? 1: 0;
        ret = *(__u8 *) data;

-------------------------------------------------------------------------------
  David Nash - dnash@chaos.demon.co.uk

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jun 26 2000 - 21:00:10 EST