[Stuck in wait_on_bh problem] possibly fixed (?)

Simon Kirby (sim@netnation.com)
Fri, 5 Mar 1999 19:17:13 -0800 (PST)


This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.

---1362361889-35476071-920689719=:9192
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Content-ID: <Pine.LNX.4.10.9903051911131.9192@peace.netnation.com>

Hey stuck in wait_on_bh people,

I stuck 2.2.2ac1 on the server that was getting the wait_on_bh stuckups
and left on vacation for a week. When I got back, the logs show that the
machine got a bunch of SCSI timeouts (around 30), which started to occur
in the first place when the motherboard was swapped (probably a dumb
problem somewhere), but also something rather interesting -- two of the
timeouts were immediately followed by wait_on_bh messages...and the box
recovered! Perhaps something in the SCSI code was fixed? The box has now
been up 9 days and it hasn't locked up yet.

Has anybody else noticed this as well?

Syslog messages:

...
Mar 1 11:00:31 peace kernel: (scsi0:-1:-1:-1) 6 commands found and queued for completion.
Mar 1 11:00:31 peace kernel:
Mar 1 11:00:31 peace kernel: wait_on_bh, CPU 1:
Mar 1 11:00:31 peace kernel: irq: 0 [0 0]
Mar 1 11:00:31 peace kernel: bh: 1 [1 0]
Mar 1 11:00:33 peace kernel: <[c010adb1]> <[c01737f3]> <[c017ae5f]> <[c017add0]> <[c015bb2e]> <[c017add0]> <[c015bd45]> <[c015bcac]> <6>(scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 31.
Mar 1 11:00:33 peace kernel: (scsi0:0:3:0) Synchronous at 40.0 Mbyte/sec, offset 31.
Mar 1 11:00:33 peace kernel: (scsi0:0:4:0) Synchronous at 40.0 Mbyte/sec, offset 31.

and

...
Mar 2 14:00:43 peace kernel: (scsi0:-1:-1:-1) 6 commands found and queued for completion.
Mar 2 14:00:43 peace kernel:
Mar 2 14:00:43 peace kernel: wait_on_bh, CPU 1:
Mar 2 14:00:43 peace kernel: irq: 0 [0 0]
Mar 2 14:00:43 peace kernel: bh: 1 [1 0]
Mar 2 14:00:45 peace kernel: <[c010adb1]> <[c01737f3]> <[c017ae5f]> <[c017add0]> <[c015bb2e]> <[c017add0]> <[c015bd45]> <[c015bcac]> <6>(scsi0:0:1:0) Synchronous at 40.0 Mbyte/sec, offset 31.
Mar 2 14:00:45 peace kernel: (scsi0:0:3:0) Synchronous at 40.0 Mbyte/sec, offset 31.
Mar 2 14:00:45 peace kernel: (scsi0:0:4:0) Synchronous at 40.0 Mbyte/sec, offset 31.

Stack trace:

> c010ad74 T synchronize_bh
> c0173778 t tcp_v4_sendmsg
> c017add0 T inet_sendmsg
> c017ad28 T inet_recvmsg
> c015baa4 T sock_sendmsg
> c017ad28 T inet_recvmsg
> c015bcac t sock_write
> c015bc0c t sock_read

(Exactly the same for the last two times I decoded it, too...)

If you want to try ac7 and you use aic7xxx SCSI, you may want to apply the
attached patch to revert a hunk to from the aic7xxx driver version 5.1.11
-> 5.1.12 patch that does some changes to the code that sets the
"DSCOMMAND0" register -- I needed to do this to avoid seeing even more
SCSI timeouts all over the place (but only one some revisions of the 7880
chipset).

I also attached my silly little perl script to less-painfully find stuff
in System.map when it's not in an oops format. ksymoops probably already
does it somehow, but why not.

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

---1362361889-35476071-920689719=:9192
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="linux-2.2.2ac7+aic7xxx_untweak_dscommand0_code.patch"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.10.9903051908390.9192@peace.netnation.com>
Content-Description:
Content-Disposition: ATTACHMENT; FILENAME="linux-2.2.2ac7+aic7xxx_untweak_dscommand0_code.patch"

LS0tIGxpbnV4L2RyaXZlcnMvc2NzaS9haWM3eHh4LmMJRnJpIE1hciAgNSAx
OTowNjoyMiAxOTk5DQorKysgbGludXgvZHJpdmVycy9zY3NpL2FpYzd4eHgu
Yy5vcmlnCUZyaSBNYXIgIDUgMTE6MzA6NDUgMTk5OQ0KQEAgLTkyMjcsMTcg
KzkyMjcsMTEgQEANCiAgICAgICAgICAgICAgIC8qDQogICAgICAgICAgICAg
ICAgKiBTZXQgdGhlIERTQ09NTUFORDAgcmVnaXN0ZXIgb24gdGhlc2UgY2Fy
ZHMgZGlmZmVyZW50IGZyb20NCiAgICAgICAgICAgICAgICAqIG9uIHRoZSA3
ODl4IGNhcmRzLiAgQWxzbywgcmVhZCB0aGUgU0VFUFJPTSBhcyB3ZWxsLg0K
KyAgICAgICAgICAgICAgICovDQogICAgICAgICAgICAgICBhaWNfb3V0Yih0
ZW1wX3AsIChhaWNfaW5iKHRlbXBfcCwgRFNDT01NQU5EMCkgfA0KICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgQ0FDSEVUSEVOIHwgTVBBUkNL
RU4pICYgfkRQQVJDS0VOLA0KICAgICAgICAgICAgICAgICAgICAgICAgRFND
T01NQU5EMCk7DQotICAgICAgICAgICAgICBhaWM3eHh4X2xvYWRfc2VlcHJv
bSh0ZW1wX3AsICZzeGZyY3RsMSk7DQotICAgICAgICAgICAgICBicmVhazsN
Ci0gICAgICAgICAgICBjYXNlIEFIQ19BSUM3ODcwOg0KICAgICAgICAgICAg
IGNhc2UgQUhDX0FJQzc4OTU6DQotICAgICAgICAgICAgICAgKi8NCi0gICAg
ICAgICAgICAgIGFpY19vdXRiKHRlbXBfcCwgKGFpY19pbmIodGVtcF9wLCBE
U0NPTU1BTkQwKSB8DQotICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICBNUEFSQ0tFTikgJiB+KERQQVJDS0VOIHwgQ0FDSEVUSEVOKSwNCi0gICAg
ICAgICAgICAgICAgICAgICAgIERTQ09NTUFORDApOw0KICAgICAgICAgICAg
ICAgYWljN3h4eF9sb2FkX3NlZXByb20odGVtcF9wLCAmc3hmcmN0bDEpOw0K
ICAgICAgICAgICAgICAgYnJlYWs7DQogICAgICAgICAgIH0NCg==
---1362361889-35476071-920689719=:9192
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="getsymbols.pl"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.10.9903051908391.9192@peace.netnation.com>
Content-Description:
Content-Disposition: ATTACHMENT; FILENAME="getsymbols.pl"

IyEvdXNyL2Jpbi9wZXJsDQoNCiMgQ2hlYXAgKGxpbmVhci1zZWFyY2hpbmcp
IGJ1dCBoZWxwZnVsIFN5c3RlbS5tYXAgbmFtZSBtYXRjaGVyDQojIEV4YW1w
bGUgaW5wdXQgKHN0ZGluKToNCiMgICAgYzAxMGFkNzQNCiMgICAgYzAxNzM3
NzggLi4uZXRjLi4uDQojDQojIFNpbW9uIEtpcmJ5LCAxOTk5LzAzLzA1DQoN
CiRjb250ZXh0c2l6ZSA9IDI7DQokZiA9ICRBUkdWWzBdOw0KJGYgPSAnL1N5
c3RlbS5tYXAnIGlmICgkZiBlcSAnJyk7DQoNCm9wZW4oSU4sIjwgJGYiKSBv
ciBkaWUgIm9wZW4oKTogJCFcbiI7DQp3aGlsZSAoPElOPil7DQogICBpZiAo
L14oW2EtejAtOV0rKSAvKXsNCiAgICAgIHB1c2goQGVtLCQxKTsNCiAgICAg
IGNob21wOw0KICAgICAgJGVteyQxfSA9ICRfOw0KICAgfQ0KfQ0KY2xvc2Uo
SU4pOw0KDQpAZW0gPSBzb3J0IEBlbTsNCg0Kd2hpbGUgKDxTVERJTj4pew0K
ICAgY2hvbXA7DQogICAkZCA9IGhleHRvZGVjKCRfKTsNCiAgIG5leHQgdW5s
ZXNzICgkZCk7DQogICBwcmludCAiJF86XG4iOw0KICAgQHJpbmcgPSAoKHVu
ZGVmKSB4ICgkY29udGV4dHNpemUgKyAxKSk7DQogICAkcHJpbnRuZXh0ID0g
JGZvdW5kID0gMDsNCiAgIGZvcmVhY2ggKEBlbSl7DQogICAgICBpZiAoJHBy
aW50bmV4dCA+IDApew0KICAgICAgICAgJHByaW50bmV4dC0tOw0KICAgICAg
ICAgcHJpbnQgIiAgICAkZW17JF99XG4iOw0KICAgICAgICAgbGFzdCBpZiAo
ISRwcmludG5leHQpOw0KICAgICAgfQ0KICAgICAgaWYgKCEkZm91bmQgJiYg
aGV4dG9kZWMoJF8pID49ICRkKXsNCiAgICAgICAgICRmb3VuZCsrOw0KICAg
ICAgICAgJGdvdGNoYSA9IHBvcChAcmluZyk7DQogICAgICAgICBmb3JlYWNo
ICRyIChAcmluZyl7DQogICAgICAgICAgICBwcmludCAiICAgICRlbXskcn1c
biI7DQogICAgICAgICB9DQogICAgICAgICBwcmludCAiPj4+ICRlbXskZ290
Y2hhfVxuIjsNCiAgICAgICAgIHB1c2goQG91dHB1dCwkZW17JGdvdGNoYX0p
Ow0KICAgICAgICAgcHJpbnQgIiAgICAkZW17JF99XG4iOw0KICAgICAgICAg
JHByaW50bmV4dCA9ICRjb250ZXh0c2l6ZSAtIDE7DQogICAgICB9DQogICAg
ICBzaGlmdChAcmluZyk7DQogICAgICBwdXNoKEByaW5nLCRfKTsNCiAgIH0N
Cn0NCg0KcHJpbnQgIlxuVHJhY2U6XG4iOw0KZm9yZWFjaCAoQG91dHB1dCl7
DQogICBwcmludCAiPiAkX1xuIjsNCn0NCg0KZXhpdCAwOw0KDQpzdWIgaGV4
dG9kZWMgew0KICAgdW5wYWNrKCJOIiwgcGFjaygiSDgiLCBzdWJzdHIoIjAi
IHggOCAuIHNoaWZ0LCAtOCkpKTsNCn0NCg==
---1362361889-35476071-920689719=:9192--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/