New ide + SMP oops (was RE: SMP + RAID)

Tom Livingston (tsl@volition.org)
Fri, 1 Oct 1999 15:50:06 -0700


This is a multi-part message in MIME format.

------=_NextPart_000_0132_01BF0C24.A1BAE7C0
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit

> I am building ide.2.2.12.19991001.patch now.

Hello, new situation to report...

Using linux-2.2.12 + ide.2.2.12.19991001.patch + ide.c spinlock patch (it
was missing from 10-01, and is needed to run in SMP) + running in SMP

hdparm -t /dev/hde & hdparm -t /dev/hdg
ran fine, both yielded 12MB/sec, good...

then started a bunch of dd's reading from both disks at the same time, and
producing md5sum's of the output. Soon after, a kernel NULL pointer
dereference OOPS'ed (attached)

After this OOPS, everything ran very very slowly. I started getting messages
on the console about once a second:
hde: lost interrupt
hdg: lost interrupt

I killed off the dd's and went to run an hdparm to see just how slow things
were. After five minutes, I killed it, thing must be pretty darn slow. But
wait, I can't kill it.. it's still showing up. After several -9's and
waiting a couple of minutes, hdparm disappears

So it seems using both channels on the pdc is still essentially broken with
the new patches, either way you go. This one is much gentler, though ;)

I will try to take a look at what else can be tested or done...

take care,

tom

------=_NextPart_000_0132_01BF0C24.A1BAE7C0
Content-Type: text/plain;
name="newest-ide-decoded-oops.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="newest-ide-decoded-oops.txt"

Unable to handle kernel NULL pointer dereference at virtual address =
00000004
current->tss.cr3 =3D 02c6c000, %cr3 =3D 02c6c000
*pde =3D 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0182f56>]
EFLAGS: 00010286
eax: 00000000 ebx: 00000b28 ecx: ffffd000 edx: c022b000
esi: c024fca4 edi: c024fc68 ebp: 00000000 esp: c2c31cb4
ds: 0018 es: 0018 ss: 0018
Process dd (pid: 680, process nr: 57, stackpage=3Dc2c31000)
Stack: c4688ec0 c2c30000 000275ea 00000000 c0190f0c c024fca4 00000008 =
c024fc68
00000000 00000046 c024fc68 c0183229 c7fec480 c2c31cf4 00000000 =
00000246
00000286 c01832c8 c7fec480 c0180a88 00000000 c0127596 c024af68 =
00000004
Call Trace: [<c0190f0c>] [<c0183229>] [<c01832c8>] [<c0180a88>] =
[<c0127596>] [<c012b4fe>] [<c01699d1>]
[<c0169a2d>] [<c0111016>] [<c0111887>] [<c0181f98>] [<c018341a>] =
[<c01836d0>] [<c0109ddd>] [<c010e9ef>]
[<c0109f4f>] [<c0108a28>] [<c0106826>] [<c0111016>] [<c012ce8c>] =
[<c0126854>] [<c012673e>] [<c0107a6c>]
[<c010002b>]
Code: 0f b6 68 04 89 6c 24 18 c1 6c 24 18 06 8b 56 40 89 54 24 24

>>EIP: c0182f56 <ide_do_request+32e/588>
Trace: c0190f0c <do_rw_disk+0/2b8>
Trace: c0183229 <do_hwgroup_request+49/5c>
Trace: c01832c8 <do_ide3_request+14/28>
Trace: c0180a88 <unplug_device+48/58>
Trace: c0127596 <__wait_on_buffer+ca/140>
Trace: c012b4fe <block_read+44e/4f4>
Trace: c01699d1 <tcp_transmit_skb+375/3dc>
Trace: c0169a2d <tcp_transmit_skb+3d1/3dc>
Trace: c0109f4f <do_IRQ+3b/5c>
Trace: c010002b <startup_32+2b/a4>
Code: c0182f56 <ide_do_request+32e/588> 00000000 <_EIP>: =
<=3D=3D=3D
Code: c0182f56 <ide_do_request+32e/588> 0: 0f b6 68 04 =
movzbl 0x4(%eax),%ebp <=3D=3D=3D
Code: c0182f5a <ide_do_request+332/588> 4: 89 6c 24 18 =
movl %ebp,0x18(%esp,1)
Code: c0182f5e <ide_do_request+336/588> 8: c1 6c 24 18 06 =
shrl $0x6,0x18(%esp,1)
Code: c0182f63 <ide_do_request+33b/588> d: 8b 56 40 =
movl 0x40(%esi),%edx
Code: c0182f66 <ide_do_request+33e/588> 10: 89 54 24 24 =
movl %edx,0x24(%esp,1)

------=_NextPart_000_0132_01BF0C24.A1BAE7C0--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/