Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen

From: Brian Rademacher
Date: Tue Sep 23 2008 - 17:05:36 EST


I disabled NCQ and same thing...Just says DMA freeze instead of NCQ freeze...

----- Original Message ----- From: "Gwendal Grignou" <gwendal@xxxxxxxxxx>
To: "Justin Piszcz" <jpiszcz@xxxxxxxxxxxxxxx>
Cc: "Brian Rademacher" <rad@xxxxxxxxxxxx>; <linux-ide@xxxxxxxxxxxxxxx>; <linux-raid@xxxxxxxxxxxxxxx>; <linux-kernel@xxxxxxxxxxxxxxx>
Sent: Tuesday, September 23, 2008 12:14 PM
Subject: Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen


About ata1:0 problem, as reported in the bugzilla bug: I would try to
disable NCQ to see if it helps. Your disks firmware might not fully
support it.

You can either add the parameter "libata.force=noncq" when loading
your kernel, or set queue_depth to 1 for all the Seagate drives behind
the Marvell MV88SX6081 controller.

About ata5:0 , someone - in user space probably - is trying to do a
SMART ENABLE operation, but the device ignores it. I don't know which
device you are using, but I assume it does not support ATA SMART
feature set. Timeout is an acceptable but not a nice way to answer, a
cancel would have been better; check if there is a firmware upgrade
for your device.

Gwendal.

On Mon, Sep 22, 2008 at 6:26 AM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
From Brian's earlier e-mail:

> I filed this kernel bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=462425


On Mon, 22 Sep 2008, Justin Piszcz wrote:

I could not agree more.

CC'ing the relevant mailing lists to see if someone out there has any idea
what more we could do as this has been affecting you (more so than myself,
but I would still like to get some sort of resolution as well, as it still
happens to me too):

Similar, but not the same issue:

Sep 17 20:20:05 p34 kernel: [1422169.440538] ata5.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 17 20:20:05 p34 kernel: [1422169.440549] ata5.00: cmd
b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
Sep 17 20:20:05 p34 kernel: [1422169.440551] res
40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 17 20:20:05 p34 kernel: [1422169.440556] ata5.00: status: { DRDY }
Sep 17 20:20:05 p34 kernel: [1422169.440561] ata5: hard resetting link
Sep 17 20:20:06 p34 kernel: [1422169.744980] ata5: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 17 20:20:06 p34 kernel: [1422169.770448] ata5.00: configured for
UDMA/133
Sep 17 20:20:06 p34 kernel: [1422169.770461] ata5: EH complete

(2.6.23.3) above

On Mon, 22 Sep 2008, Brian Rademacher wrote:

Works fine...Also works under heavy load with only 4 drives. I could
only get it to fail by doing a raid resync with 4 drives, except for the
newer kernel, which dies pretty easily..

What is really frustrating about it is that short of the bugzilla bug I
submitted, I don't know who would be willing to listen...A lot of the google
hits when searching "action 0x2 frozen" are related to a particular CDROM
drive, or general hardware failure. I really don't think that is the case
here, but I bet most of the kernel people think the same thing, so they have
no reason to care...


Sent: Monday, September 22, 2008 7:04 AM
Subject: Re: Hardware RAID


What about if you just 'stress' one drive?

1. dd if=/dev/sda of=/dev/null bs=1M &
Does it do it?
2. Same thing for sdb?

Justin.

On Mon, 22 Sep 2008, Brian Rademacher wrote:

I killed smartd for testing. Other than that, it seems entirely load
based. Anything disk intensive (backups, raid resync, a bunch of spam comes
in at once, etc.) makes it fail...

Sent: Monday, September 22, 2008 6:29 AM
Subject: Re: Hardware RAID


While the error happens for me as well it does NOT happen with that
much consistency, if I were you, I would start testing different kernels and
run it in single user mode (or as close to it as you can) to see if you can
narrow down what is causing it, also boot knoppix and see if it occurs-- ?

Justin.

On Mon, 22 Sep 2008, Brian Rademacher wrote:

Doesn't look like a very powerful RAID card, so I may pass on it. I
don't think it will have the BW to run as fast as the software RAID
currently does since it's only a 64bit/66mhz PCI slot...

I hate to do the hardware RAID thing, but this error is killing me:
Sep 21 12:05:19 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 12:32:12 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 12:41:34 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 12:58:22 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 13:11:04 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 13:23:55 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 13:54:23 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 15:15:04 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 15:44:06 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen
Sep 21 21:15:12 radfiles kernel: ata1.00: exception Emask 0x0 SAct
0x1 SErr 0x0 action 0x2 frozen

And at this point, I can either regress to a 4 drive RAID and don't
update the kernel, or move forward with hardware...

I don't see a fix coming any time soon, but maybe I'll try one of the
latest F10 kernels just to see if anything has changed...


----- Original Message ----- From: "Justin Piszcz" Sent: Monday,
September 22, 2008 2:05 AM
Subject: Re: Hardware RAID




On Sun, 21 Sep 2008, Brian Rademacher wrote:

The RAID gods must have been thinking about me. My MB has one of
these funny slots and supports ZCR, so for the price I'm going to jump ship.
I would guess (and hope) this solves the problem, especially since I'll have
to reconstruct the entire array...


http://cgi.ebay.com/2113600-R-Adaptec-Serial-ATA-RAID-2025SA-Storage_W0QQitemZ250295938636QQihZ015QQcategoryZ167QQssPageNameZWDVWQQrdZ1QQcmdZViewItem

Hm cool-- let me know how it goes.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/