oopses in SCSI mid-level, 1 source

Douglas Gilbert (dgilbert@interlog.com)
Sun, 21 Feb 1999 20:50:45 -0500


I was doing some stress testing on a sg driver that I'm writing
and was able to oops the SCSI mid-level several times.

The oops log post-processed by ksymoops is at the bottom of this
email. It seems to be coming from line 1925 of scsi.c . The function
scsi_build_commandblocks() calls scsi_init_malloc() and _assumes_
it returns a valid pointer. That function calls either kmalloc() or
__get_free_pages() with GFP_ATOMIC (perhaps or-ed with GFP_DMA if
an ISA adapter is being used). Those calls seem a lot more
likely to return 0 in 2.2 than the did in 2.0 :-( Hence the oops.
It's my guess that several emails to this newsgroup reporting oopses
in the scsi sub-system are related to this particular problem.

It is not clear to me what that code should do when 'SCpnt' gets
0 assigned to it. Could the queue_length be shortened to that value,
perhaps panic-ing if that yields a zero queue_length? Could not
the scsi_dma_pool be used? In any case, that code should change.

While on the subject of oops-es, I can get a repeatable "rock solid"
oops using the following sequence of pathological code:
"insmod sg; sg_close_early; rmmod sg"
The 'sg_close_early' write()s off a scsi command but closes the
file descriptor before the matching read(). [This roughly
simulates an unexpected CTRL-C.] It is pretty obvious what is
going on: the mid-level bh handler finally gets an "interrupt"
and then calls sg but it's no longer there. The scsi mid-level
no longer allows a high-level module (eg sg) to abort a command
"in flight". What to do?

Oops printout for scsi mid-level oops follows:

Feb 21 10:48:14 frig kernel: Detected scsi CD-ROM sr0 at scsi1, channel
0, id 4, lun 0
Feb 21 10:48:15 frig kernel: Detected scsi generic sge at scsi1, channel
0, id 5, lun 0

Warning in compare_ksyms_lsmod, module sg is in lsmod but not in ksyms,
probably no symbols exported
Feb 21 10:48:15 frig kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000058
Feb 21 10:48:15 frig kernel: current->tss.cr3 = 016df000, `r3 = 016df000
Feb 21 10:48:15 frig kernel: *pde = 00000000
Feb 21 10:48:15 frig kernel: Oops: 0002
Feb 21 10:48:15 frig kernel: CPU: 0
Feb 21 10:48:15 frig kernel: EIP: 0010:[<c01bf9fb>]
Feb 21 10:48:15 frig kernel: EFLAGS: 00010246
Feb 21 10:48:15 frig kernel: eax: 00000000 ebx: c0094c00 ecx:
00000005 edx: 00000000
Feb 21 10:48:15 frig kernel: esi: 00000002 edi: 00000058 ebp:
c0006000 esp: c170bf14
Feb 21 10:48:15 frig kernel: ds: 0018 es: 0018 ss: 0018
Feb 21 10:48:15 frig kernel: Process insmod (pid: 970, process nr: 67,
stackpage=c170b000)
Feb 21 10:48:15 frig kernel: Stack: c484addc ffffffea c484addc c01c0ef1
c0094c00 c484addc 00000000 c484804b
Feb 21 10:48:15 frig kernel: c01c1014 c484addc c4848000 c48496e6
00000004 c484addc c0113693 c170a000
Feb 21 10:48:15 frig kernel: 0804e698 c4848000 bffffdc8 c1556000
c484ae68 c484ae78 00000002 c1556000
Feb 21 10:48:15 frig kernel: Call Trace: [<c484addc>] [<c484addc>]
[<c01c0ef1>] [<c484addc>] [<c484804b>] [<c01c1014>] [<c484addc>]
Feb 21 10:48:15 frig kernel: [<c4848000>] [<c48496e6>]
[<c484addc>] [<c0113693>] [<c4848000>] [<c484ae68>] [<c484ae78>]
[<c4843000>]
Feb 21 10:48:15 frig kernel: [<c4848048>] [<c0107b0c>]
[<c4848000>]
Feb 21 10:48:15 frig kernel: Code: f3 ab 89 2a 89 5a 08 8a 43 1c 88 42
40 8a 43 1d 88 42 41 8a

>>EIP: c01bf9fb <scsi_build_commandblocks+5b/f0>
Trace: c484addc <END_OF_CODE+174d4/????>
Trace: c484addc <END_OF_CODE+174d4/????>
Trace: c01c0ef1 <scsi_register_device_module+9d/c8>
Trace: c484addc <END_OF_CODE+174d4/????>
Trace: c484804b <END_OF_CODE+14743/????>
Trace: c01c1014 <scsi_register_module+4c/5c>
Trace: c484addc <END_OF_CODE+174d4/????>
Trace: c4848000 <END_OF_CODE+146f8/????>
Trace: c4848048 <END_OF_CODE+14740/????>
Code: c01bf9fb <scsi_build_commandblocks+5b/f0> 00000000 <_EIP>:
Code: c01bf9fb <scsi_build_commandblocks+5b/f0> 0: f3 ab
repz stosl %eax,%es:(%edi)
Code: c01bf9fd <scsi_build_commandblocks+5d/f0> 2: 89 2a
movl %ebp,(%edx)
Code: c01bf9ff <scsi_build_commandblocks+5f/f0> 4: 89 5a 08
movl %ebx,0x8(%edx)
Code: c01bfa02 <scsi_build_commandblocks+62/f0> 7: 8a 43 1c
movb 0x1c(%ebx),%al
Code: c01bfa05 <scsi_build_commandblocks+65/f0> a: 88 42 40
movb %al,0x40(%edx)
Code: c01bfa08 <scsi_build_commandblocks+68/f0> d: 8a 43 1d
movb 0x1d(%ebx),%al
Code: c01bfa0b <scsi_build_commandblocks+6b/f0> 10: 88 42 41
movb %al,0x41(%edx)
Code: c01bfa0e <scsi_build_commandblocks+6e/f0> 13: 8a 00
movb (%eax),%al

2 warnings issued. Results may not be reliable.

------------------------------------------------------------------
Douglas Gilbert [dgilbert@interlog.com or dougg@triode.net.au]
48 Windsor Court Road Web: www.interlog.com/~dgilbert
Thornhill, Ontario L3T 4Y5, Canada Tel: +1 905 771 6151

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/