Re: [PATCH v5 3/3] perf x86: Exposing an Uncore unit to PMON for Intel Xeon server platform

From: Sudarikov, Roman
Date: Thu Feb 13 2020 - 07:36:55 EST


On 13.02.2020 1:56, Greg KH wrote:
On Wed, Feb 12, 2020 at 03:58:50PM -0500, Liang, Kan wrote:

On 2/12/2020 12:31 PM, Sudarikov, Roman wrote:
On 11.02.2020 23:14, Greg KH wrote:
On Tue, Feb 11, 2020 at 02:59:21PM -0500, Liang, Kan wrote:
On 2/11/2020 1:57 PM, Greg KH wrote:
On Tue, Feb 11, 2020 at 10:42:00AM -0800, Andi Kleen wrote:
On Tue, Feb 11, 2020 at 09:15:44AM -0800, Greg KH wrote:
On Tue, Feb 11, 2020 at 07:15:49PM +0300,
roman.sudarikov@xxxxxxxxxxxxxxx wrote:
+static ssize_t skx_iio_mapping_show(struct device *dev,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct device_attribute *attr, char *buf)
+{
+ÂÂÂ struct pmu *pmu = dev_get_drvdata(dev);
+ÂÂÂ struct intel_uncore_pmu *uncore_pmu =
+ÂÂÂÂÂÂÂ container_of(pmu, struct intel_uncore_pmu, pmu);
+
+ÂÂÂ struct dev_ext_attribute *ea =
+ÂÂÂÂÂÂÂ container_of(attr, struct dev_ext_attribute, attr);
+ÂÂÂ long die = (long)ea->var;
+
+ÂÂÂ return sprintf(buf, "0000:%02x\n",
skx_iio_stack(uncore_pmu, die));
If "0000:" is always the "prefix" of the output of
this file, why have
it at all as you always know it is there?
I think Roman only test with BIOS configured as single-segment. So he
hard-code the segment# here.

I'm not sure if Roman can do some test with multiple-segment
BIOS. If not, I
think we should at least print a warning here.

What is ever going to cause that to change?
I think it's just to make it a complete PCI address.
Is that what this really is? If so, it's not a "complete" pci address,
is it? If it is, use the real pci address please.
I think we don't need a complete PCI address here. The attr is
to disclose
the mapping information between die and PCI BUS. Segment:BUS
should be good
enough.
"good enough" for today, but note that you can not change the format of
the data in the file in the future, you would have to create a new file.
So I suggest at least try to future-proof it as much as possible if you
_know_ this could change.

Just use the full pci address, there's no reason not to, otherwise it's
just confusing.

thanks,

greg k-h
Hi Greg,

Yes, the "Segment:Bus" pair is enough to distinguish between different
Root ports.
I think Greg suggests us to use full PCI address here.

Hi Greg,

There may be several devices are connected to IIO stack. There is no full
PCI address for IIO stack.
Please define "full" for me. Please please don't tell me you are just
using a truncated version of the PCI address. I thought we got rid of
all of that nonsense 10 years ago...

I don't think we can list all of devices in the same IIO stack with full PCI
address here either. It's not necessary, and only increase maintenance
overhead.
Then what exactly _IS_ this number, if not the PCI address?

Something made up to look almost like a PCI address, but not quite?
Somethine else?

I think we may have two options here.

Option 1: Roman's proposal.The format of the file is "Segment:Bus". For the
future I can see, the format doesn't need to be changed.
E.g. $ls /sys/devices/uncore_<type>_<pmu_idx>/die0
$0000:7f
Again, fake PCI address?
Hi Greg,

Actually, there are two reasons why we've chosen the "Segment:Root Bus" notion to
represent Root port to IO PMU mapping:
1. it meets feature requirements to uniquely identify each Root Port on the system
2. that notion - "Segment:Root Bus" - is already used by the kernel to represent
Root ports is sysfs; see commit 37d6a0a6f4700 and example below taken for
Intel Xeon V5 (Skylake Server):

# ls /sys/devices/ | grep pci
pci0000:00
pci0000:17
pci0000:3a
pci0000:5d
pci0000:80
pci0000:85
pci0000:ae
pci0000:d7

Having full conventional PCI address in the form of "Segment:Bus:Device.Function"
is just not required to distinguish one Root Bus from the other.
But if there is any other agreement regarding the way how PCI Root ports are
supposed to show up in the sysfs then please let us know.

Thanks,
Roman
Option 2: Use full PCI address, but use -1 to indicate invalid address.
E.g. $ls /sys/devices/uncore_<type>_<pmu_idx>/die0
$0000:7f:-1:-1
"Invalid"? Why? Why not just refer to the 0:0 device, as that's the
bus "root" address (or whatever it's called, I can't remember PCI stuff
all that well...)

Should we use the format in option 2?
What could userspace do with a -1 -1 address?

thanks,

greg k-h