Re: Possible mptsas regression post 3.5.0

From: John Drescher
Date: Mon Aug 27 2012 - 10:10:58 EST


On Fri, Aug 24, 2012 at 3:34 PM, John Drescher <drescherjm@xxxxxxxxx> wrote:
> On Thu, Aug 23, 2012 at 1:34 PM, John Drescher <drescherjm@xxxxxxxxx> wrote:
>> Over the last few weeks I have done some reliability testing with
>> mdraid6 on a machine with 2 lsi mptsas controllers and 13 SATA I
>> drives. My testing involved physically hot removing a drive forcing
>> the raid to grab a spare and rebuild. This worked great for the 5 or
>> so times I did this on gentoo-sources-3.5.0 and lower. However any
>> attempt to do this on gentoo-sources-3.5.1 or even the 3.6-rc2 git
>> resulted in a total lockup of the array. I originally thought this was
>> a mdadm regression and posted about that last week here:
>>
>> https://lkml.org/lkml/2012/8/17/503
>>
>
> I have bisected the kernel a few times now and the problem was
> introduced between
> 3.5.0.00007-ged29dbd
> 3.5.0.00015-g4d9157e
>
> After the raid rebuilds again I will bisect again and see if I can
> narrow it down to the exact patch.
>

I have bisected it down to the following patch:

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[10f8d5b86743b33d841a175303e2bf67fd620f42] SCSI: fix hot unplug vs
async scan race

It appears this patch caused the bad behavior although I have not
tested that yet. I am rebuilding the array (takes ~2 hours) from the
previous good bisect.

--
John M. Drescher
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/