Re: ATA failure regression in kernel 4.2

From: Alexander Holler
Date: Tue Jul 28 2015 - 14:37:44 EST


Am 28.07.2015 um 20:19 schrieb Alex Deucher:
On Mon, Jul 27, 2015 at 12:30 PM, Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx> wrote:
On 2015/7/27 23:21, Alex Deucher wrote:
On Sun, Jul 26, 2015 at 11:01 PM, Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx> wrote:
On 2015/7/25 1:38, Alex Deucher wrote:
On Thu, Jul 23, 2015 at 2:44 PM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
On Thu, Jul 23, 2015 at 2:35 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
Hello,

On Thu, Jul 23, 2015 at 01:48:24PM -0400, Alex Deucher wrote:
Something new in kernel 4.2 seems to have broken one of my hard drives
(ssd) in kernel 4.2. 4.1 and older kernels work fine. Here are the
relevant logs.

...
[ 6.547628] ata2.00: qc timeout (cmd 0xec)
[ 6.547721] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 7.007213] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 16.997819] ata2.00: qc timeout (cmd 0xec)
[ 16.997910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 16.997995] ata2: limiting SATA link speed to 3.0 Gbps
[ 17.457400] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[ 47.429257] ata2.00: qc timeout (cmd 0xec)
[ 47.429349] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 47.888822] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

Nothing really rings a bell. Timeouts on IDENTIFY. Could be IRQ
related. Which controller is it (lspci -nn)? Also, can you try to
bisect the issue?

00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
SATA Controller [AHCI mode] [1022:7801] (rev 40)
00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD] FCH
IDE Controller [1022:780c]

I can take a look at bisecting later this week.

You were right about the interrupts. This is an AMD Kaveri APU system.
Hi Alex,
Could you please help to provide more information about the
system so we could identify the issue? Dmesg and /proc/interrupts
from good and bad kernels are welcomed.
Thanks!
Gerry

See attached. Thanks!
Hi Alex,
Thanks for the info. Seems something is wrong with multiple-MSI
support. To narrow down the scope, could you please help to:

I'm also not getting interrupts in my gpu driver. I haven't bisected
this specifically, but I suspect it is related since it to used to
work in 4.1. Whether I enable MSIs or not in my driver, I get a huge
numbers of interrupts on all CPUs as soon as the driver is loaded, but
the driver isr never gets called. E.g.,
49: 117757835 117763227 117787837 117868913 PCI-MSI
524288-edge amdgpu
nointremap doesn't seem to help.

Same problem here with the AMD SATA controller (1022:7801). It failed to identify the second disk when using 4.2-rc4.

nointremap helped, nothing else tested

Regards,

Alexander Holler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/