RE: oops with USB Storage on 2.6.14

From: goggin, edward
Date: Tue Nov 08 2005 - 11:25:26 EST


I've run into a bug like this several times using 2.6.14-rc4 while
testing dm-multipath's reaction to uevents generated by forcing
fiber channel transport failures -- which leads to the scsi device
being detached and the queuedata pointer in the device's queue being
reset in scsi_device_dev_release. The fix I've used is below and
it seems to work well for me. I was going to place this patch on
dm-devel today or tomorrow anyway.

drivers/scsi/scsi_lib.c:scsi_next_command()
Call scsi_device_get and scsi_device_put around the calls to
scsi_put_command
and scsi_run_queue so that the scsi host structure will not be de-allocated
between scsi_put_command and scsi_run_queue.

*** ../base/linux-2.6.14-rc4/drivers/scsi/scsi_lib.c Mon Oct 10 20:19:19
2005
--- drivers/scsi/scsi_lib.c Thu Nov 3 13:30:03 2005
***************
*** 592,601 ****

void scsi_next_command(struct scsi_cmnd *cmd)
{
! struct request_queue *q = cmd->device->request_queue;

scsi_put_command(cmd);
scsi_run_queue(q);
}

void scsi_run_host_queues(struct Scsi_Host *shost)
--- 592,611 ----

void scsi_next_command(struct scsi_cmnd *cmd)
{
! struct scsi_device *sdev = cmd->device;
! struct request_queue *q = sdev->request_queue;
!
! // need to hold a reference on the device before we let go of the
cmd
! if (scsi_device_get(sdev)) {
! scsi_put_command(cmd);
! return; // maybe sdev_state == SDEV_CANCEL, SDEV_DEL
! }

scsi_put_command(cmd);
scsi_run_queue(q);
+
+ // ok to remove device now
+ scsi_device_put(sdev);
}

void scsi_run_host_queues(struct Scsi_Host *shost)


> -----Original Message-----
> From: linux-scsi-owner@xxxxxxxxxxxxxxx
> [mailto:linux-scsi-owner@xxxxxxxxxxxxxxx] On Behalf Of Andrew Morton
> Sent: Monday, November 07, 2005 11:41 PM
> To: Masanari Iida
> Cc: linux-kernel@xxxxxxxxxxxxxxx;
> linux-usb-devel@xxxxxxxxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx
> Subject: Re: oops with USB Storage on 2.6.14
>
> Masanari Iida <standby24x7@xxxxxxxxx> wrote:
> >
> > Hello,
> > I updated my system's kernel from 2.6.13.2 to 2.6.14,
> > then it oops when I connect my Digital Camera via USB connection
> > as USB storage device.
> > I went back to 2.6.14-rc1, still the same panic happen.
> > 2.6.13.2 and before, the kernel has been worked as expected.
> >
> > CPU Intel P4(2.4Ghz)
> > USB Device Pentax Optio S40.
> >
> > Unable to handle kernel paging request at virtual address dc9d1f4c
> > printing eip:
> > c02b44cc
> > *pde = 00073067
> > *pte = 1c9d1000
> > Oops: 0000 [#1]
> > SMP DEBUG_PAGEALLOC
> > Modules linked in: autofs e100 ipt_LOG ipt_state ip_conntrack
> > ipt_recent iptable
> > _filter ip_tables video rtc
> > CPU: 1
> > EIP: 0060:[<c02b44cc>] Not tainted VLI
> > EFLAGS: 00010286 (2.6.14)
> > EIP is at scsi_run_queue+0xc/0xd0
> > eax: 00000001 ebx: dc9d1e3c ecx: d6b67910 edx: dc9d1e3c
> > esi: d5048eb0 edi: dc9d1e3c ebp: c1507e98 esp: c1507e84
> > ds: 007b es: 007b ss: 0068
> > Process ksoftirqd/1 (pid: 6, threadinfo=c1506000 task=dfe2dad0)
> > Stack: 00000292 de3a7bf8 dc9d1e3c d5048eb0 dc9d1e3c
> c1507ea8 c02b4612 dc9d1e3c
> > da51bf60 c1507ecc c02b473f d5048eb0 00000000
> 00000024 00000286 00000001
> > d5048eb0 00000000 c1507f10 c02b4b2e d5048eb0
> 00000000 00000024 00000001
> >
> > Call Trace:
> > [<c0103abf>] show_stack+0x7f/0xa0
> > [<c0103c72>] show_registers+0x162/0x1d0
> > [<c0103e90>] die+0x100/0x1a0
> > [<c039d7ae>] do_page_fault+0x31e/0x640
> > [<c0103763>] error_code+0x4f/0x54
> > [<c02b4612>] scsi_next_command+0x22/0x30
> > [<c02b473f>] scsi_end_request+0xcf/0xf0
> > [<c02b4b2e>] scsi_io_completion+0x26e/0x470
> > [<c02b4fc7>] scsi_generic_done+0x37/0x50
> > [<c02af9e5>] scsi_finish_command+0x85/0xa0
> > [<c02af89c>] scsi_softirq+0xcc/0x140
> > [<c0122085>] __do_softirq+0xd5/0xf0
> > [<c01220d8>] do_softirq+0x38/0x40
> > [<c0122685>] ksoftirqd+0x95/0xe0
> > [<c0131cfa>] kthread+0xba/0xc0
> > [<c0100ecd>] kernel_thread_helper+0x5/0x18
> > Code: f0 8b 42 44 e8 16 7f 0e 00 89 45 ec 89 1c 24 e8 6b b7
> ff ff eb aa 89 f6 8d
> > bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 08 8b 55 08 <8b>
> 82 10 01 00 00 8b 38
> > f6 80 85 01 00 00 80 0f 85 9e 00 00 00
> > <0>Kernel panic - not syncing: Fatal exception in interrupt
> >
>
> Has there been any progress on this?
>
> If not, can you please test the latest snapshot from
> ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots and if
> it still fails, raise a bug at bugzilla.kernel.org?
>
> Thanks.
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/