Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event

From: Shuai Xue
Date: Sat Jul 19 2025 - 01:23:54 EST




在 2025/7/19 00:35, Bjorn Helgaas 写道:
On Fri, Jul 18, 2025 at 01:29:18PM +0800, Shuai Xue wrote:
在 2025/7/18 11:46, Matthew W Carlis 写道:
On Thu, Jul 17, 2025 Bjorn Helgaas wrote
So I think your idea of adding current link speed/width to the "Link
Up" event is still on the table, and that does sound useful to me.

We're already reading the link status register here to check DLLA so
it would be nice. I guess if everything is healthy we're probably already
at the maximum speed by this point.

In the future we might add another tracepoint when we enumerate the
device and know the Vendor/Device ID.

I think we might have someone who would be interested in doing it.

IIUC, the current hotplug event (or presence event) is enough for Matthew.
and we would like a new tracepoing for link speed change which reports
speeds.

For hotplug event, I plan to send a new version to

1. address Bjorn' concerns about event strings by removing its spaces.

#define PCI_HOTPLUG_EVENT \
EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP") \
EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN") \
EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT") \
EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "PCI_HOTPLUG_CARD_NOT_PRESENT")

2. address Ilpo comments by moving pci_hp_event to a common place
(include/trace/events/pci.h) so that the new comming can also use it.

For link speed change event (perhaps named as pci_link_event),
I plan to send a seperate patch, which provides:

TP_STRUCT__entry(
__string( port_name, port_name )
__field( unsigned char, cur_bus_speed )
__field( unsigned char, max_bus_speed )
__field( unsigned char, width )
__field( unsigned int, flit_mode )
__field( unsigned char, reason )
),

The reason field is from Lukas ideas which indicates why the link speed
changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.

Are you happy with above changes?

Seems good to me.

What do you plan for PCI_HOTPLUG_LINK_UP? It would be nice to have
the link info there since that's sort of a link speed change itself.

Bjorn

Hi, Bjorn,

IMMO, link information is best included in the pci_link_event tracepoint,
rather than in the hotplug event itself.

For example, with the a device hotpluged, the trace output looks like this:

$ cat /sys/kernel/debug/tracing/trace_pipe

<...>-120 [002] ..... 104.864051: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_CARD_PRESENT

<...>-120 [002] ..... 104.864081: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_LINK_UP

irq/57-pciehp-120 [002] ..... 104.990434: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:5

irq/57-pciehp-120 [002] ..... 104.992377: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:0

Here, reason 5 corresponds to PCIe hotplug, and reason 0 indicates a PCIe link retrain.

This approach keeps the separation of concerns clear: hotplug events are
tracked by pci_hp_event, while all link-related information, including
speed changes and the reason for the change, is handled by
pci_link_event.

Thanks.
Shuai