RE: Panic at _blk_run_queue on 2.6.32

From: Rich, Jason
Date: Tue Sep 10 2013 - 16:46:11 EST


> -----Original Message-----
> From: Rich, Jason
> Sent: Tuesday, September 10, 2013 1:04 PM
> To: 'Willy Tarreau'
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: Panic at _blk_run_queue on 2.6.32
>
> > -----Original Message-----
> > From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
> > owner@xxxxxxxxxxxxxxx] On Behalf Of Willy Tarreau
> > Sent: Wednesday, July 10, 2013 3:27 PM
> > To: Rich, Jason
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > Subject: Re: Panic at _blk_run_queue on 2.6.32
> >
> > Hi Jason,
> >
> > On Tue, Jul 09, 2013 at 05:42:29PM +0000, Rich, Jason wrote:
> > > Greetings,
> > > I've recently encountered an issue where multiple hosts are failing
> > > to boot up about 1/5 of the time. So far I have confirmed this
> > > issue on three
> > seperate host machines. The issue presents itself after updating
> > 2.6.32.39 to patch 50 and patch 61.
> > > Both patch levels result in the failure described below. Since this
> > > occurs on
> > multiple hosts, I feel I can safely rule out hardware.
> >
> > First, thank you for your very detailed report. Do you think you could
> > narrow this down to a specific kernel version ? Given that there are
> > exactly 10 versions between .39 and .50, I think that a version-level
> > bisect would take
> > 3 or 4 builds (so probably around 20 reboots).
> >
> > It would help us spot the faulty patch. Right now, there are 546
> > patches between .39 and .50 so it's quite hard to find the culprit,
> > even with your full trace. That does not mean we'll immediately spot
> > it, maybe a deeper bisect will be needed, but it should be easier.
> >
> > > It is also of note that I have not seen this behavior on the 3.4.26
> > > kernel, or
> > on any of my 32bit hosts.
> >
> > This is a good news, because we're probably missing one fix from a
> > more recent version that addressed a similar regression and that we
> > might backport into 2.6.32.62.
> >
> > > That said, I have to support this software release (which runs on
> > > the 2.6
> > kernel) for at least another two years.
> >
> > Be careful on this point, 2.6.32 is planned for EOL next year :
> >
> > https://www.kernel.org/category/releases.html
> >
> > You might want to consider migrating to a supported distro kernel or
> > to 3.2 instead. That said, if you follow carefully the updates from
> > later kernels, you might prefer to maintain your own backports of the
> > patches that are relevant to your usage.
> >
> > Best regards,
> > Willy
> >
>
> Greeting Willy,
> You helped me out with this particular issue about 2 months ago. What we
> found is that my particular panic appears to be addressed by a specific
> commit you referred me to:
> b485462 [SCSI] Stop accepting SCSI requests before removing a device
>
> Without going into too much detail, I'm not able to jump directly to that hash
> because I have about 7 different drivers failing to compile due to other
> changes between 2.6.32.61 and that hash. In particular, some header files
> were renamed, others deleted and replaced by newer features. To go
> through and update my proprietary drivers is as big of a headache as just
> getting this scsi panic fixed on top of patch 61.
>
> I've spent the last couple of weeks playing with getting the scsi fix applied on
> top of patch 61 and am having a very difficult time. There are so many
> dependencies from prior commits to the scsi code it is making it quite difficult
> to determine what exactly I need.
>
> I'm hoping you might be able to help me out with some advice or perhaps
> you are familiar enough with the scsi code as to help me apply the concept of
> the fix to the top of patch 61. I have attached the patch I've come up with so
> far, but this is obviously missing other dependencies as I keep ending up with
> panics. It goes without saying that this code is very foreign to me and I don't
> completely understand what it is doing.
>
> I know your time is valuable so I've attached the patch I've been working on
> so far, however, this code causes its own kernel panic and should not be run
> on a live system. That said, perhaps it will give you a baseline as to what I'm
> trying to do. Again, this patch is based off on the official 2.6.32.61 tag.
>
> Thanks for any help,
> Jason Rich

Apologies, I had been tweaking that patch file and didn't realize I corrupted it. I deleted a line in the scsi_sysfs.c area of the diff and forgot to update the line numbers. Should be +912,24 (not 25) :
+++ linux-2.6.32.new/drivers/scsi/scsi_sysfs.c 2013-09-09 14:01:38.249104690 -0500
@@ -912,16 +912,24 @@

I have attached the corrected patch file. Don't want to waste your time with the old one. Again, apologies.
>
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

Attachment: 0001-scsi_panic.patch
Description: 0001-scsi_panic.patch