Re: [PATCH v5 3/3] perf x86: Exposing an Uncore unit to PMON for Intel Xeon® server platform

From: Greg KH
Date: Thu Feb 13 2020 - 08:08:35 EST


On Thu, Feb 13, 2020 at 03:36:44PM +0300, Sudarikov, Roman wrote:
> On 13.02.2020 1:56, Greg KH wrote:
> > On Wed, Feb 12, 2020 at 03:58:50PM -0500, Liang, Kan wrote:
> > >
> > > On 2/12/2020 12:31 PM, Sudarikov, Roman wrote:
> > > > On 11.02.2020 23:14, Greg KH wrote:
> > > > > On Tue, Feb 11, 2020 at 02:59:21PM -0500, Liang, Kan wrote:
> > > > > > On 2/11/2020 1:57 PM, Greg KH wrote:
> > > > > > > On Tue, Feb 11, 2020 at 10:42:00AM -0800, Andi Kleen wrote:
> > > > > > > > On Tue, Feb 11, 2020 at 09:15:44AM -0800, Greg KH wrote:
> > > > > > > > > On Tue, Feb 11, 2020 at 07:15:49PM +0300,
> > > > > > > > > roman.sudarikov@xxxxxxxxxxxxxxx wrote:
> > > > > > > > > > +static ssize_t skx_iio_mapping_show(struct device *dev,
> > > > > > > > > > +                struct device_attribute *attr, char *buf)
> > > > > > > > > > +{
> > > > > > > > > > +    struct pmu *pmu = dev_get_drvdata(dev);
> > > > > > > > > > +    struct intel_uncore_pmu *uncore_pmu =
> > > > > > > > > > +        container_of(pmu, struct intel_uncore_pmu, pmu);
> > > > > > > > > > +
> > > > > > > > > > +    struct dev_ext_attribute *ea =
> > > > > > > > > > +        container_of(attr, struct dev_ext_attribute, attr);
> > > > > > > > > > +    long die = (long)ea->var;
> > > > > > > > > > +
> > > > > > > > > > +    return sprintf(buf, "0000:%02x\n",
> > > > > > > > > > skx_iio_stack(uncore_pmu, die));
> > > > > > > > > If "0000:" is always the "prefix" of the output of
> > > > > > > > > this file, why have
> > > > > > > > > it at all as you always know it is there?
> > > > > > I think Roman only test with BIOS configured as single-segment. So he
> > > > > > hard-code the segment# here.
> > > > > >
> > > > > > I'm not sure if Roman can do some test with multiple-segment
> > > > > > BIOS. If not, I
> > > > > > think we should at least print a warning here.
> > > > > >
> > > > > > > > > What is ever going to cause that to change?
> > > > > > > > I think it's just to make it a complete PCI address.
> > > > > > > Is that what this really is?  If so, it's not a "complete" pci address,
> > > > > > > is it?  If it is, use the real pci address please.
> > > > > > I think we don't need a complete PCI address here. The attr is
> > > > > > to disclose
> > > > > > the mapping information between die and PCI BUS. Segment:BUS
> > > > > > should be good
> > > > > > enough.
> > > > > "good enough" for today, but note that you can not change the format of
> > > > > the data in the file in the future, you would have to create a new file.
> > > > > So I suggest at least try to future-proof it as much as possible if you
> > > > > _know_ this could change.
> > > > >
> > > > > Just use the full pci address, there's no reason not to, otherwise it's
> > > > > just confusing.
> > > > >
> > > > > thanks,
> > > > >
> > > > > greg k-h
> > > > Hi Greg,
> > > >
> > > > Yes, the "Segment:Bus" pair is enough to distinguish between different
> > > > Root ports.
> > > I think Greg suggests us to use full PCI address here.
> > >
> > > Hi Greg,
> > >
> > > There may be several devices are connected to IIO stack. There is no full
> > > PCI address for IIO stack.
> > Please define "full" for me. Please please don't tell me you are just
> > using a truncated version of the PCI address. I thought we got rid of
> > all of that nonsense 10 years ago...
> >
> > > I don't think we can list all of devices in the same IIO stack with full PCI
> > > address here either. It's not necessary, and only increase maintenance
> > > overhead.
> > Then what exactly _IS_ this number, if not the PCI address?
> >
> > Something made up to look almost like a PCI address, but not quite?
> > Somethine else?
> >
> > > I think we may have two options here.
> > >
> > > Option 1: Roman's proposal.The format of the file is "Segment:Bus". For the
> > > future I can see, the format doesn't need to be changed.
> > > E.g. $ls /sys/devices/uncore_<type>_<pmu_idx>/die0
> > > $0000:7f
> > Again, fake PCI address?
> Hi Greg,
>
> Actually, there are two reasons why we've chosen the "Segment:Root Bus"
> notion to
> represent Root port to IO PMU mapping:
> 1. it meets feature requirements to uniquely identify each Root Port on the
> system
> 2. that notion - "Segment:Root Bus" - is already used by the kernel to
> represent
> Root ports is sysfs; see commit 37d6a0a6f4700 and example below taken for
> Intel Xeon V5 (Skylake Server):
>
> # ls /sys/devices/ | grep pci
> pci0000:00
> pci0000:17
> pci0000:3a
> pci0000:5d
> pci0000:80
> pci0000:85
> pci0000:ae
> pci0000:d7
>
> Having full conventional PCI address in the form of
> "Segment:Bus:Device.Function"
> is just not required to distinguish one Root Bus from the other.
> But if there is any other agreement regarding the way how PCI Root ports are
> supposed to show up in the sysfs then please let us know.

Ok, if you all are set on this, and feel like it will work properly for
the next few years without needing to be changed, I can live with it.
Thanks for the explanations.

greg k-h