Re: [SCSI REGRESSION] 3.10.2 or 3.10.3: arcmsr failure at bootup/ early userspace transition

From: Douglas Gilbert
Date: Mon Jul 29 2013 - 20:29:06 EST


On 13-07-29 05:09 PM, Nix wrote:
On 29 Jul 2013, Bernd Schubert uttered the following:

On 07/29/2013 03:05 PM, Nix wrote:
On 29 Jul 2013, Bernd Schubert said:

Hi Nick,

On 07/29/2013 12:10 PM, Nick Alcock wrote:
arcmsr0: abort device command of scsi id = 0 lun = 1
arcmsr0: abort device command of scsi id = 0 lun = 0
arcmsr: executing bus reset eh.....num_resets=0, num_[...]

arcmsr0: wait 'abort all outstanding command' timeout
arcmsr0: executing hw bus reset ....
arcmsr0: waiting for hw bus reset return, retry=0
arcmsr0: waiting for hw bus reset return, retry=1
Areca RAID Controller0: F/W V1.46 2009-01-06 & Model ARC-1210
arcmsr: scsi bus reset eh returns with success
[and back to the top of the error messages again, apparently forever,
not that the machine would be much use without its RAID array even
if this loop terminated at some point, so I only gave it a couple
of minutes]

The failure happens precisely at the moment we transition to early
userspace, so presumably userspace I/O is failing (or something related
to raw device access, perhaps, since the first thing it does is a
vgscan).

I haven't bisected yet (sorry, I have work to do which means this
machine must be running right now), but nothing has changed in the
arcmsr controller, nor in SCSI-land excepting

commit 98dcc2946adbe4349ef1ef9b99873b912831edd4
Author: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
Date: Thu Jun 6 22:15:55 2013 -0400

I can now confirm that reverting this commit causes this problem to go
away, and my machine boots fine again.

Please revert (and figure out what is wrong so that 3.11 doesn't
implode in the same way? I'm happy to assist...)

Hi,
Please supply the information that Martin Petersen asked
for.

I just examined a more recent Areca SAS RAID controller
and would describe it as the SCSI device from hell. One solution
to this problem is to modify the arcmsr driver so it returns
a more consistent set of lies to the management SCSI commands that
Martin is asking about.

Doug Gilbert

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/