OK guys, you were right. The bug was in our code - sorry for trouble.
Turns out that while I was away, the problem was solved by someone else. The
problem is probably related to the fact that when we did
'spin_lock_irqsave(c,d)', 'd' was a global variable. The fix was to wrap the
call with another function and declare 'd' as local. I can't quite explain,
but I think that changing from a static to automatic variable made the
difference. My best guess is that since 'd' is passed by value and not by
reference, the macro expansion of spin_lock_irqsave() relies on the location
of 'd' in the stack and if 'd' was on the heap instead, it might get
trashed.
I would really like to hear your expert opinion on my assumption.
Thanks,
Shmulik.
-----Original Message-----
From: Andrew Morton [mailto:andrewm@uow.edu.au]
Sent: Wednesday, March 07, 2001 2:54 PM
To: Hen, Shmulik
Cc: 'LKML'
Subject: Re: spinlock help
"Hen, Shmulik" wrote:
>
> The kdb trace was accurate, we could actually see the e100 ISR pop from no
> where right in the middle of our ans_notify every time the TX queue would
> fill up. When we commented out the call to spin_*_irqsave(), it worked
fine
> under heavy stress for days.
>
> Is it possible it was something wrong with 2.4.0-test10 specifically ?
>
Sorry, no. If spin_lock_irqsave()/spin_unlock_irqrestore()
were accidentally reenabling interrupts then it would be
the biggest, ugliest catastrophe since someone put out a kernel
which forgot to flush dirty inodes to disk :)
Conceivably it was a compiler bug. Were you using egcs-1.1.2/x86?
-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Mar 15 2001 - 21:00:07 EST