Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus

From: Alex Williamson
Date: Fri Nov 30 2018 - 00:56:16 EST


On Fri, 30 Nov 2018 05:29:47 +0000
Bharat Bhushan <bharat.bhushan@xxxxxxx> wrote:

> Hi,
>
> > -----Original Message-----
> > From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > Sent: Thursday, November 29, 2018 1:46 AM
> > To: Bharat Bhushan <bharat.bhushan@xxxxxxx>
> > Cc: alex.williamson@xxxxxxxxxx; Bjorn Helgaas <helgaas@xxxxxxxxxx>; linux-
> > pci@xxxxxxxxxxxxxxx; Linux Kernel Mailing List <linux-
> > kernel@xxxxxxxxxxxxxxx>; bharatb.yadav@xxxxxxxxx; David Daney
> > <david.daney@xxxxxxxxxx>; jglauber@xxxxxxxxxx;
> > mbroemme@xxxxxxxxxx; chrisrblake93@xxxxxxxxx
> > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus
> >
> > On Tue, Nov 27, 2018 at 10:32 PM Bharat Bhushan
> > <bharat.bhushan@xxxxxxx> wrote:
> >
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > Sent: Tuesday, November 27, 2018 9:39 PM
> > > > To: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> > > > Cc: Bharat Bhushan <bharat.bhushan@xxxxxxx>;
> > > > linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > > bharatb.yadav@xxxxxxxxx; David Daney <david.daney@xxxxxxxxxx>;
> > Jan
> > > > Glauber <jglauber@xxxxxxxxxx>; Maik Broemme
> > <mbroemme@xxxxxxxxxx>;
> > > > Chris Blake <chrisrblake93@xxxxxxxxx>
> > > > Subject: Re: [PATCH] PCI: Mark NXP LS1088 to avoid bus reset bus
> > > >
> > > > On Tue, 27 Nov 2018 09:33:56 -0600
> > > > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > > > > 4) Is there a hardware erratum for this? If so, please include
> > > > > the URL here.
> > >
> > > No h/w errata as of now.
> >
> > Does that mean (a) the HW folks agree this is a hardware problem but they
> > haven't written an erratum, (b) there is an erratum but it isn't public, (c) we
> > don't have any concrete evidence of a hardware problem, but things just
> > don't work if we do a bus reset, (d) something else?
>
> I will say it is (c) - not concluded to be hardware h/w issue.
>
> >
> > > In pci_reset_secondary_bus() I have tried to increase the delay after reset
> > but not helped.
> > > Do I need to add delay at some other place as well?
> >
> > No, I think the place you tried should be enough.
> >
> > You should also be able to exercise this from user-space by using "setpci" to
> > set and clear the Secondary Bus Reset bit in the Bridge Control register. Then
> > you can also use setpci to read/write config space of the NIC. The kernel
> > would normally read the Vendor and Device IDs as the first access to the
> > device during enumeration. You also might be able to learn something by
> > using "lspci -vv" on the bridge before and after the reset to see if it logs any
> > AER bits (if it supports AER) or the other standard error logging bits.
>
> I tried below sequence for Secondary bus reset and device config space show 0xff
>
> root@localhost:~# lspci -x
> 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
> 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00
> 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00
>
> 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
> 00: 86 80 d3 10 06 04 10 00 00 00 00 02 10 00 00 00
> 10: 00 00 0c 40 00 00 00 40 01 00 00 00 00 00 0e 40
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 1f a0
> 30: 00 00 24 40 c8 00 00 00 00 00 00 00 63 01 00 00
>
> root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x40
> root@localhost:~# setpci -s 0002:00:00.0 0x3e.b=0x00
>
> root@localhost:~# lspci -x
> 0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
> 00: 57 19 c0 80 07 01 10 00 10 00 04 06 08 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 01 ff 00 01 01 00 00
> 20: 00 40 00 40 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 40 63 01 00 00

Just for curiosity sake, what if you re-write the secondary and
subordinate bus registers here:

# setpci -s 0002:00:00.0 0x19.b=0x01
# setpci -s 0002:00:00.0 0x1a.b=0xff

IIRC the users that debugged the AMD bus reset issue re-wrote the
entire 64 bytes of the bridge config header and then further narrowed
the issue down to the two registers above. If one bridge
implementation can have such an issue, maybe others do too. Perhaps
there's common IP in use. Are you able to test other endpoints besides
this e1000e device with this setpci technique? Thanks,

Alex

> 0002:01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff)
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> Thanks
> -Bharat
>
>