[igb] AER timeout - resend.

From: Ian Kumlien
Date: Mon Feb 23 2015 - 09:57:22 EST


Sending this to both netdev and kernel since i don't know if it's the
driver or the pcie AER that does something odd - the machine was
stable before 3.19 and PCIE AER.

Everything started out like i first sent to linux nics () intel:
------

And today i had some issues and wondered why things was broken, i was met with:

[950016.366477] pcieport 0000:00:04.0: AER: Uncorrected (Non-Fatal)
error received: id=0500
[950016.366495] igb 0000:05:00.0: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, id=0500(Requester ID)
[950016.366502] igb 0000:05:00.0: device [8086:1521] error
status/mask=00004000/00000000
[950016.366509] igb 0000:05:00.0: [14] Completion Timeout
[950016.366519] igb 0000:05:00.0: broadcast error_detected message
[950016.379742] br0: port 1(enp5s0f0) entered disabled state
[950016.488213] igb 0000:05:00.0: broadcast slot_reset message
[950016.588014] igb 0000:05:00.0: broadcast resume message
[950016.752654] igb 0000:05:00.0: AER: Device recovery successful
[950019.817249] igb 0000:05:00.1 enp5s0f1: igb: enp5s0f1 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[950020.699773] igb 0000:05:00.0 enp5s0f0: igb: enp5s0f0 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX
[950020.701485] br0: port 1(enp5s0f0) entered forwarding state
[950020.701504] br0: port 1(enp5s0f0) entered forwarding state
[976152.448092] ata5: exception Emask 0x50 SAct 0x0 SErr 0x4090800
action 0xe frozen
[976152.448100] ata5: irq_stat 0x00400040, connection status changed
[976152.448107] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
[976152.448117] ata5: hard resetting link
[976152.448134] ata6: exception Emask 0x50 SAct 0x0 SErr 0x4090800
action 0xe frozen
[976152.448140] ata6: irq_stat 0x00400040, connection status changed
[976152.448147] ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
[976152.448155] ata6: hard resetting link
[976153.171195] ata6: SATA link down (SStatus 0 SControl 300)
[976158.174058] ata6: hard resetting link
[976158.174110] ata5: SATA link down (SStatus 0 SControl 300)
[976163.176997] ata5: hard resetting link
[976163.480133] ata6: SATA link down (SStatus 0 SControl 300)
[976163.480147] ata6: limiting SATA link speed to 1.5 Gbps
[976168.483028] ata6: hard resetting link
[976168.483095] ata5: SATA link down (SStatus 0 SControl 300)
[976168.483108] ata5: limiting SATA link speed to 1.5 Gbps
[976173.485907] ata5: hard resetting link
[976173.789066] ata6: SATA link down (SStatus 0 SControl 310)
[976173.789080] ata6.00: disabled
[976173.791066] ata6: EH complete
[976173.791078] ata5: SATA link down (SStatus 0 SControl 310)
[976173.791085] ata6.00: detaching (SCSI 5:0:0:0)
[976173.791090] ata5.00: disabled
[976173.794073] ata5: EH complete
[976173.794100] ata5.00: detaching (SCSI 4:0:0:0)
[976173.794968] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
[976173.795073] sd 5:0:0:0: [sdb] Synchronize Cache(10) failed:
Result: hostbyte=0x04 driverbyte=0x00
[976173.795080] sd 5:0:0:0: [sdb] Stopping disk
[976173.795108] sd 5:0:0:0: [sdb] Start/Stop Unit failed: Result:
hostbyte=0x04 driverbyte=0x00
[976173.797180] sd 4:0:0:0: [sda] Synchronizing SCSI cache
[976173.797254] sd 4:0:0:0: [sda] Synchronize Cache(10) failed:
Result: hostbyte=0x04 driverbyte=0x00
[976173.797261] sd 4:0:0:0: [sda] Stopping disk
[976173.797285] sd 4:0:0:0: [sda] Start/Stop Unit failed: Result:
hostbyte=0x04 driverbyte=0x00

So two out of two disks just failed and isn't replying anymore?

Seven hours after a AER this machine who's intel ssd:s are idle just
fail to respond? ;)

Anyway, will reboot it when i get home - any idea/suggestion is more
than welcome.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/