Re: lspci CorrErr- UnsuppReq- changes (was: Asus eeepc 1008HAsuspend issue and mac80211 suspend corner) case

From: Luis R. Rodriguez
Date: Tue Dec 22 2009 - 14:13:07 EST


Changing the subject to the PCI specific questions. Hoping someone
from linux-pci can help me understand the meaning of the PCI config
space changes a little better.

I notice that when my device becomes unresponsive after pm-suspend both
CorrErr+ UnsuppReq+ change to
CorrErr- UnsuppReq- .

These should be:

#define PCI_EXP_DEVSTA 10 /* Device Status */
#define PCI_EXP_DEVSTA_CED 0x01 /* Correctable Error Detected */
#define PCI_EXP_DEVSTA_NFED 0x02 /* Non-Fatal Error Detected */
#define PCI_EXP_DEVSTA_FED 0x04 /* Fatal Error Detected */
#define PCI_EXP_DEVSTA_URD 0x08 /* Unsupported Request Detected */
#define PCI_EXP_DEVSTA_AUXPD 0x10 /* AUX Power Detected */
#define PCI_EXP_DEVSTA_TRPND 0x20 /* Transactions Pending */

I don't see the kernel using them except for in tg3 tg3_chip_reset() to
clear the PCI_EXP_DEVSTA on PCI-express upon reset for some reason:

http://lxr.linux.no/#linux+v2.6.32/drivers/net/tg3.c#L6522

But this was added as a work around:

commit 5e7dfd0fb94abed04f59481d1ce0cc06a892048a
Author: Matt Carlson <mcarlson@xxxxxxxxxxxx>
Date: Fri Nov 21 17:18:16 2008 -0800

tg3: Prevent corruption at 10 / 100Mbps w CLKREQ

This patch disables CLKREQ at 10Mbps and 100Mbps to workaround a TX BD
corruption issue. This problem only affects the 5784 and 5761 (and
57780 AX) ASIC revisions.

Signed-off-by: Matt Carlson <mcarlson@xxxxxxxxxxxx>
Signed-off-by: Michael Chan <mchan@xxxxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

Apart from this I don't see any other uses for PCI_EXP_DEVSTA_CED and
PCI_EXP_DEVSTA_URD on the kernel as of 2.6.32 so I will likely need
to start looking at the PCI spec to decipher this. But before I do so
I still am curious what the lspci difference in output for these two
on from + to - would mean for these two.

I leave below the relevant PCI sections of my last post.

Luis

On Mon, Dec 21, 2009 at 09:23:55PM -0500, Luis R. Rodriguez wrote:
> As for the specific Asus eeepc 1008HA issue what I'm seeing is ath9k
> talking to harware fine prior to suspend, disabling harware and then
> upon resume it becomes unusable, failing at the first harware reset.
> lspci tells me the following when the device is functional, both during
> initial boot, and during successfull pm-suspend cycles:
>
> 01:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
> Subsystem: Device 1a3b:1089
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 32 bytes
> Interrupt: pin A routed to IRQ 18
> Region 0: Memory at fbef0000 (64-bit, non-prefetchable) [size=64K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
> Address: 00000000 Data: 0000
> Capabilities: [60] Express (v2) Legacy Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
> ClockPM- Suprise- LLActRep- BwNot-
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> Capabilities: [100] Advanced Error Reporting <?>
> Capabilities: [140] Virtual Channel <?>
> Capabilities: [160] Device Serial Number 12-14-24-ff-ff-17-15-00
> Capabilities: [170] Power Budgeting <?>
> Kernel driver in use: ath9k
> Kernel modules: ath9k
>
> I do notice a difference when resume goes bust and the ath9k device becomes unhappy. This
> is what I see:
>
> --- lspci-ok.txt 2009-12-21 17:22:24.000000000 -0800
> +++ lspci-busted.txt 2009-12-21 17:22:50.000000000 -0800
> @@ -16,7 +16,7 @@
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> - DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
> + DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
> ClockPM- Suprise- LLActRep- BwNot-
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
>
> The line in question is the PCI device status. The CorrErr indicates
> "Correctable Error Detected" and the UnsuppReq indicates "Unsupported
> Request Detected". Its not entirely clear to me what exact unsupported
> request must have been sent. I've considered getting help to look at this
> with a PCI analyzer but first I wanted to check and see if others are seeing
> this with the 1008HA or similar platform familes and if there are pointers
> some can give.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/