Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen

From: Gwendal Grignou
Date: Tue Sep 23 2008 - 14:14:59 EST


About ata1:0 problem, as reported in the bugzilla bug: I would try to
disable NCQ to see if it helps. Your disks firmware might not fully
support it.

You can either add the parameter "libata.force=noncq" when loading
your kernel, or set queue_depth to 1 for all the Seagate drives behind
the Marvell MV88SX6081 controller.

About ata5:0 , someone - in user space probably - is trying to do a
SMART ENABLE operation, but the device ignores it. I don't know which
device you are using, but I assume it does not support ATA SMART
feature set. Timeout is an acceptable but not a nice way to answer, a
cancel would have been better; check if there is a firmware upgrade
for your device.

Gwendal.

On Mon, Sep 22, 2008 at 6:26 AM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
> From Brian's earlier e-mail:
>
>> > I filed this kernel bug:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=462425
>
>
> On Mon, 22 Sep 2008, Justin Piszcz wrote:
>
>> I could not agree more.
>>
>> CC'ing the relevant mailing lists to see if someone out there has any idea
>> what more we could do as this has been affecting you (more so than myself,
>> but I would still like to get some sort of resolution as well, as it still
>> happens to me too):
>>
>> Similar, but not the same issue:
>>
>> Sep 17 20:20:05 p34 kernel: [1422169.440538] ata5.00: exception Emask 0x0
>> SAct 0x0 SErr 0x0 action 0x6 frozen
>> Sep 17 20:20:05 p34 kernel: [1422169.440549] ata5.00: cmd
>> b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
>> Sep 17 20:20:05 p34 kernel: [1422169.440551] res
>> 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
>> Sep 17 20:20:05 p34 kernel: [1422169.440556] ata5.00: status: { DRDY }
>> Sep 17 20:20:05 p34 kernel: [1422169.440561] ata5: hard resetting link
>> Sep 17 20:20:06 p34 kernel: [1422169.744980] ata5: SATA link up 3.0 Gbps
>> (SStatus 123 SControl 300)
>> Sep 17 20:20:06 p34 kernel: [1422169.770448] ata5.00: configured for
>> UDMA/133
>> Sep 17 20:20:06 p34 kernel: [1422169.770461] ata5: EH complete
>>
>> (2.6.23.3) above
>>
>> On Mon, 22 Sep 2008, Brian Rademacher wrote:
>>
>>> Works fine...Also works under heavy load with only 4 drives. I could
>>> only get it to fail by doing a raid resync with 4 drives, except for the
>>> newer kernel, which dies pretty easily..
>>>
>>> What is really frustrating about it is that short of the bugzilla bug I
>>> submitted, I don't know who would be willing to listen...A lot of the google
>>> hits when searching "action 0x2 frozen" are related to a particular CDROM
>>> drive, or general hardware failure. I really don't think that is the case
>>> here, but I bet most of the kernel people think the same thing, so they have
>>> no reason to care...
>>>
>>>
>>> Sent: Monday, September 22, 2008 7:04 AM
>>> Subject: Re: Hardware RAID
>>>
>>>
>>>> What about if you just 'stress' one drive?
>>>>
>>>> 1. dd if=/dev/sda of=/dev/null bs=1M &
>>>> Does it do it?
>>>> 2. Same thing for sdb?
>>>>
>>>> Justin.
>>>>
>>>> On Mon, 22 Sep 2008, Brian Rademacher wrote:
>>>>
>>>>> I killed smartd for testing. Other than that, it seems entirely load
>>>>> based. Anything disk intensive (backups, raid resync, a bunch of spam comes
>>>>> in at once, etc.) makes it fail...
>>>>>
>>>>> Sent: Monday, September 22, 2008 6:29 AM
>>>>> Subject: Re: Hardware RAID
>>>>>
>>>>>
>>>>>> While the error happens for me as well it does NOT happen with that
>>>>>> much consistency, if I were you, I would start testing different kernels and
>>>>>> run it in single user mode (or as close to it as you can) to see if you can
>>>>>> narrow down what is causing it, also boot knoppix and see if it occurs-- ?
>>>>>>
>>>>>> Justin.
>>>>>>
>>>>>> On Mon, 22 Sep 2008, Brian Rademacher wrote:
>>>>>>
>>>>>>> Doesn't look like a very powerful RAID card, so I may pass on it. I
>>>>>>> don't think it will have the BW to run as fast as the software RAID
>>>>>>> currently does since it's only a 64bit/66mhz PCI slot...
>>>>>>>
>>>>>>> I hate to do the hardware RAID thing, but this error is killing me:
>>>>>>> Sep 21 12:05:19 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 12:32:12 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 12:41:34 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 12:58:22 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 13:11:04 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 13:23:55 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 13:54:23 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 15:15:04 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 15:44:06 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>> Sep 21 21:15:12 radfiles kernel: ata1.00: exception Emask 0x0 SAct
>>>>>>> 0x1 SErr 0x0 action 0x2 frozen
>>>>>>>
>>>>>>> And at this point, I can either regress to a 4 drive RAID and don't
>>>>>>> update the kernel, or move forward with hardware...
>>>>>>>
>>>>>>> I don't see a fix coming any time soon, but maybe I'll try one of the
>>>>>>> latest F10 kernels just to see if anything has changed...
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message ----- From: "Justin Piszcz" Sent: Monday,
>>>>>>> September 22, 2008 2:05 AM
>>>>>>> Subject: Re: Hardware RAID
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, 21 Sep 2008, Brian Rademacher wrote:
>>>>>>>>
>>>>>>>>> The RAID gods must have been thinking about me. My MB has one of
>>>>>>>>> these funny slots and supports ZCR, so for the price I'm going to jump ship.
>>>>>>>>> I would guess (and hope) this solves the problem, especially since I'll have
>>>>>>>>> to reconstruct the entire array...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://cgi.ebay.com/2113600-R-Adaptec-Serial-ATA-RAID-2025SA-Storage_W0QQitemZ250295938636QQihZ015QQcategoryZ167QQssPageNameZWDVWQQrdZ1QQcmdZViewItem
>>>>>>>>
>>>>>>>> Hm cool-- let me know how it goes.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/