Re: 2.6.29 regression: ATA bus errors on resume

From: Niel Lambrechts
Date: Mon Apr 06 2009 - 17:26:51 EST


On 04/06/2009 09:39 PM, Tejun Heo wrote:
> Hello,
>
> Niel Lambrechts wrote:
>
>> On 04/06/2009 12:09 PM, Tejun Heo wrote:
>>
>>>> Will the fix naturally make its way into the mainline kernel, or is
>>>> there any extra debugging/testing I can help with?
>>>>
>>>>
>>> Well, the problem is the debug patch doesn't actually do anything
>>> other than printing out messages. It could be that the problem is
>>> timing dependent (which is likely anyway). You still can reporduce
>>> the problem with the patch, right?
>>>
>>>
>> Heh? You provided two patches, with the last one you said:
>>
>
> Yeah, the second one actually only added printks to see whether that's
> the case. No behavior change.
>
>
>>> Strange. Maybe IO commands are getting through while the sdev is
>>> still in quiesce state? Can you please repeat the test with the
>>> attached patch?
>>>
>> With the latter, I have not encountered the original problem i.e. any
>> severe EXT4 corruption again, not in 2.6.29 and not in 2.6.29.1.
>>
>
> Eh... so, we're definitely seeing something which is dependent on
> timing.
>
>
>> Do I also need to try the last patch without any debugging messages?
>>
>
> Then there will be nothing left. :-) Can you please try the attached
> patch? It's still only debug messages but lighter; hopefully, it
> won't mask the problem.
>
Sorry, my bad - I assumed the 2nd patch actually made a functional
difference... :)

Here is the output on 2.6.29.1 with your new patch - still nothing
serious happening, resume still seems okay!

cheers
Niel
Apr 6 23:12:40 linux-7vph kernel: Syncing filesystems ... done.
Apr 6 23:12:40 linux-7vph kernel: Freezing user space processes ... (elapsed 0.00 seconds) done.
Apr 6 23:12:40 linux-7vph kernel: Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
Apr 6 23:12:40 linux-7vph kernel: PM: Shrinking memory... done (64710 pages freed)
Apr 6 23:12:40 linux-7vph kernel: PM: Freed 258840 kbytes in 2.57 seconds (100.71 MB/s)
Apr 6 23:12:40 linux-7vph kernel: Suspending console(s) (use no_console_suspend to debug)
Apr 6 23:12:40 linux-7vph kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Apr 6 23:12:40 linux-7vph kernel: ACPI handle has no context!
Apr 6 23:12:40 linux-7vph kernel: iwlagn 0000:03:00.0: PCI INT A disabled
Apr 6 23:12:40 linux-7vph kernel: ata1: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 6 23:12:40 linux-7vph kernel: ata2: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 6 23:12:40 linux-7vph kernel: ehci_hcd 0000:00:1d.7: PCI INT D disabled
Apr 6 23:12:40 linux-7vph kernel: ehci_hcd 0000:00:1d.7: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.2: PCI INT C disabled
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.1: PCI INT B disabled
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.0: PCI INT A disabled
Apr 6 23:12:41 linux-7vph kernel: HDA Intel 0000:00:1b.0: PCI INT B disabled
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1a.7: PCI INT D disabled
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1a.7: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.2: PCI INT C disabled
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.1: PCI INT B disabled
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.0: PCI INT A disabled
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: PME# enabled
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability enabled by ACPI
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: PME# enabled
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability enabled by ACPI
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: PCI INT A disabled
Apr 6 23:12:41 linux-7vph kernel: ACPI: Preparing to enter system sleep state S4
Apr 6 23:12:41 linux-7vph kernel: PM: Saving platform NVS memory
Apr 6 23:12:41 linux-7vph kernel: Disabling non-boot CPUs ...
Apr 6 23:12:41 linux-7vph kernel: CPU 1 is now offline
Apr 6 23:12:41 linux-7vph kernel: SMP alternatives: switching to UP code
Apr 6 23:12:41 linux-7vph kernel: CPU0 attaching NULL sched-domain.
Apr 6 23:12:41 linux-7vph kernel: CPU1 attaching NULL sched-domain.
Apr 6 23:12:41 linux-7vph kernel: CPU0 attaching NULL sched-domain.
Apr 6 23:12:41 linux-7vph kernel: CPU1 is down
Apr 6 23:12:41 linux-7vph kernel: Extended CMOS year: 2000
Apr 6 23:12:41 linux-7vph kernel: PM: Creating hibernation image:
Apr 6 23:12:41 linux-7vph kernel: PM: Need to copy 124169 pages
Apr 6 23:12:41 linux-7vph kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Apr 6 23:12:41 linux-7vph kernel: Intel machine check architecture supported.
Apr 6 23:12:41 linux-7vph kernel: Intel machine check reporting enabled on CPU#0.
Apr 6 23:12:41 linux-7vph kernel: PM: Restoring platform NVS memory
Apr 6 23:12:41 linux-7vph kernel: Extended CMOS year: 2000
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900007, writing 0x900403)
Apr 6 23:12:41 linux-7vph kernel: HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100106, writing 0x100102)
Apr 6 23:12:41 linux-7vph kernel: ahci 0000:00:1f.2: restoring config space at offset 0x1 (was 0x2b00403, writing 0x2b00407)
Apr 6 23:12:41 linux-7vph kernel: Enabling non-boot CPUs ...
Apr 6 23:12:41 linux-7vph kernel: SMP alternatives: switching to SMP code
Apr 6 23:12:41 linux-7vph kernel: Booting processor 1 APIC 0x1 ip 0x6000
Apr 6 23:12:41 linux-7vph kernel: Initializing CPU#1
Apr 6 23:12:41 linux-7vph kernel: Calibrating delay using timer specific routine.. 5054.09 BogoMIPS (lpj=10108190)
Apr 6 23:12:41 linux-7vph kernel: CPU: L1 I cache: 32K, L1 D cache: 32K
Apr 6 23:12:41 linux-7vph kernel: CPU: L2 cache: 6144K
Apr 6 23:12:41 linux-7vph kernel: [ds] using Core 2/Atom configuration
Apr 6 23:12:41 linux-7vph kernel: CPU: Physical Processor ID: 0
Apr 6 23:12:41 linux-7vph kernel: CPU: Processor Core ID: 1
Apr 6 23:12:41 linux-7vph kernel: Intel machine check architecture supported.
Apr 6 23:12:41 linux-7vph kernel: Intel machine check reporting enabled on CPU#1.
Apr 6 23:12:41 linux-7vph kernel: x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
Apr 6 23:12:41 linux-7vph kernel: CPU1: Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz stepping 06
Apr 6 23:12:41 linux-7vph kernel: CPU0 attaching NULL sched-domain.
Apr 6 23:12:41 linux-7vph kernel: Switched to high resolution mode on CPU 1
Apr 6 23:12:41 linux-7vph kernel: CPU0 attaching sched-domain:
Apr 6 23:12:41 linux-7vph kernel: domain 0: span 0-1 level MC
Apr 6 23:12:41 linux-7vph kernel: groups: 0 1
Apr 6 23:12:41 linux-7vph kernel: CPU1 attaching sched-domain:
Apr 6 23:12:41 linux-7vph kernel: domain 0: span 0-1 level MC
Apr 6 23:12:41 linux-7vph kernel: groups: 1 0
Apr 6 23:12:41 linux-7vph kernel: CPU1 is up
Apr 6 23:12:41 linux-7vph kernel: ACPI: Waking up from system sleep state S4
Apr 6 23:12:41 linux-7vph kernel: ACPI: EC: non-query interrupt received, switching to interrupt mode
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.0: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.0: power state changed by ACPI to D0
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.1: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:03.0: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability disabled by ACPI
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: wake-up capability disabled by ACPI
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: e1000e 0000:00:19.0: irq 29 for MSI/MSI-X
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.0: power state changed by ACPI to D0
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.1: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.2: power state changed by ACPI to D0
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.2: PCI INT C -> GSI 22 (level, low) -> IRQ 22
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1a.2: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1a.7: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1a.7: power state changed by ACPI to D0
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1a.7: PCI INT D -> GSI 23 (level, low) -> IRQ 23
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1a.7: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1a.7: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: HDA Intel 0000:00:1b.0: PCI INT B -> GSI 17 (level, low) -> IRQ 17
Apr 6 23:12:41 linux-7vph kernel: HDA Intel 0000:00:1b.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: pcieport-driver 0000:00:1c.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: pcieport-driver 0000:00:1c.1: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: pcieport-driver 0000:00:1c.3: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: pcieport-driver 0000:00:1c.4: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.0: power state changed by ACPI to D0
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.1: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
Apr 6 23:12:41 linux-7vph kernel: uhci_hcd 0000:00:1d.2: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1d.7: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1d.7: power state changed by ACPI to D0
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1d.7: PCI INT D -> GSI 19 (level, low) -> IRQ 19
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1d.7: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: ehci_hcd 0000:00:1d.7: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:1e.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: ahci 0000:00:1f.2: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: ata1: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x16 frozen
Apr 6 23:12:41 linux-7vph kernel: ata2: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x16 frozen
Apr 6 23:12:41 linux-7vph kernel: iwlagn 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Apr 6 23:12:41 linux-7vph kernel: iwlagn 0000:03:00.0: irq 30 for MSI/MSI-X
Apr 6 23:12:41 linux-7vph kernel: pci 0000:15:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Apr 6 23:12:41 linux-7vph kernel: Registered led device: iwl-phy0:radio
Apr 6 23:12:41 linux-7vph kernel: Registered led device: iwl-phy0:assoc
Apr 6 23:12:41 linux-7vph kernel: Registered led device: iwl-phy0:RX
Apr 6 23:12:41 linux-7vph kernel: Registered led device: iwl-phy0:TX
Apr 6 23:12:41 linux-7vph kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[17] MMIO=[f4801000-f48017ff] Max Packet=[2048] IR/IT contexts=[4/4]
Apr 6 23:12:41 linux-7vph kernel: pci 0000:15:00.2: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: pci 0000:15:00.3: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: pci 0000:15:00.4: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: pci 0000:15:00.5: PME# disabled
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] Starting disk
Apr 6 23:12:41 linux-7vph kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 6 23:12:41 linux-7vph kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: configured for UDMA/133
Apr 6 23:12:41 linux-7vph kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
Apr 6 23:12:41 linux-7vph kernel: ata1: irq_stat 0x00400040, connection status changed
Apr 6 23:12:41 linux-7vph kernel: ata1: hard resetting link
Apr 6 23:12:41 linux-7vph kernel: ata2.00: ACPI cmd e3/00:1f:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata2.00: ACPI cmd e3/00:02:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata2.00: ACPI cmd e3/00:1f:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata2.00: ACPI cmd e3/00:02:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata2.00: configured for UDMA/133
Apr 6 23:12:41 linux-7vph kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
Apr 6 23:12:41 linux-7vph kernel: ata2: irq_stat 0x40000001
Apr 6 23:12:41 linux-7vph kernel: ata2.00: configured for UDMA/133
Apr 6 23:12:41 linux-7vph kernel: ata2: EH complete
Apr 6 23:12:41 linux-7vph kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded
Apr 6 23:12:41 linux-7vph kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
Apr 6 23:12:41 linux-7vph kernel: ata1.00: configured for UDMA/133
Apr 6 23:12:41 linux-7vph kernel: ata1: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 t3
Apr 6 23:12:41 linux-7vph kernel: ata1.00: configured for UDMA/133
Apr 6 23:12:41 linux-7vph kernel: ata1: EH complete
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors: (200 GB/186 GiB)
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] Write Protect is off
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors: (200 GB/186 GiB)
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] Write Protect is off
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Apr 6 23:12:41 linux-7vph kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.0: power state changed by ACPI to D0
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900407, writing 0x900403)
Apr 6 23:12:41 linux-7vph kernel: pci 0000:00:02.0: setting latency timer to 64
Apr 6 23:12:41 linux-7vph kernel: Restarting tasks ... done.