Re: [RFC PATCH 7/9] PCI/AER: Add RCEC AER handling

From: Sean V Kelley
Date: Mon Jul 27 2020 - 11:19:42 EST


On 27 Jul 2020, at 5:22, Jonathan Cameron wrote:

On Fri, 24 Jul 2020 10:22:21 -0700
Sean V Kelley <sean.v.kelley@xxxxxxxxx> wrote:

The Root Complex Event Collectors(RCEC) appear as peers to Root Ports
and also have the AER capability. So add RCEC support to the current AER
service driver and attach the AER service driver to the RCEC device.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>
Signed-off-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx>

A few questions and comments for this patch.

See inline.

Jonathan


---
drivers/pci/pcie/aer.c | 34 +++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f1bf06be449e..7cc430c74c46 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -303,7 +303,7 @@ int pci_aer_raw_clear_status(struct pci_dev *dev)
return -EIO;

port_type = pci_pcie_type(dev);
- if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
+ if (port_type == PCI_EXP_TYPE_ROOT_PORT || port_type == PCI_EXP_TYPE_RC_EC) {
pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &status);
pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, status);
}
@@ -389,6 +389,12 @@ void pci_aer_init(struct pci_dev *dev)
pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_ERR, sizeof(u32) * n);

pci_aer_clear_status(dev);
+
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) {
+ if (!pci_find_ext_capability(dev, PCI_EXT_CAP_ID_RCEC))
+ return;
+ pci_info(dev, "AER: RCEC CAP FOUND and cap_has_rtctl = %d\n", n);

It feels like failing to find an RC_EC extended cap in a RCEC deserved
a nice strong error message. Perhaps this isn't the right place to do it
though. For that matter, why are we checking for it here?

Sorry, Iâve left an in-development output in the code. Will replace with a check with more meaningful output elsewhere.


+ }
}

void pci_aer_exit(struct pci_dev *dev)
@@ -577,7 +583,8 @@ static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
if ((a == &dev_attr_aer_rootport_total_err_cor.attr ||
a == &dev_attr_aer_rootport_total_err_fatal.attr ||
a == &dev_attr_aer_rootport_total_err_nonfatal.attr) &&

It is a bit ugly to have these called rootport_total_err etc for the rcec.
Perhaps we should just add additional attributes to reflect we are looking at
an RCEC?

I was trying to avoid any renaming to reduce churn as I did with my first patch for ACPI / CLX_OSC support.
Will take a look.


- pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+ ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC)))
return 0;

return a->mode;
@@ -894,7 +901,10 @@ static bool find_source_device(struct pci_dev *parent,
if (result)
return true;

- pci_walk_bus(parent->subordinate, find_device_iter, e_info);
+ if (pci_pcie_type(parent) == PCI_EXP_TYPE_RC_EC)
+ pcie_walk_rcec(parent, find_device_iter, e_info);
+ else
+ pci_walk_bus(parent->subordinate, find_device_iter, e_info);

if (!e_info->error_dev_num) {
pci_info(parent, "can't find device of ID%04x\n", e_info->id);
@@ -1030,6 +1040,7 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
if (!(info->status & ~info->mask))
return 0;
} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
info->severity == AER_NONFATAL) {

@@ -1182,6 +1193,8 @@ static int set_device_error_reporting(struct pci_dev *dev, void *data)
int type = pci_pcie_type(dev);

if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
+ (type == PCI_EXP_TYPE_RC_EC) ||
+ (type == PCI_EXP_TYPE_RC_END) ||

Why add RC_END here?

Iâm not clear on your question. Errors can come from RCEC or RCiEPs. We still need to enable reporting by the RCiEPs.


(type == PCI_EXP_TYPE_UPSTREAM) ||
(type == PCI_EXP_TYPE_DOWNSTREAM)) {
if (enable)
@@ -1206,9 +1219,11 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev,
{
set_device_error_reporting(dev, &enable);

- if (!dev->subordinate)
- return;
- pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
+ pcie_walk_rcec(dev, set_device_error_reporting, &enable);
+ else if (dev->subordinate)
+ pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+
}

/**
@@ -1306,6 +1321,11 @@ static int aer_probe(struct pcie_device *dev)
struct device *device = &dev->device;
struct pci_dev *port = dev->port;

+ /* Limit to Root Ports or Root Complex Event Collectors */
+ if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
+ (pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
+ return -ENODEV;
+
rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
if (!rpc)
return -ENOMEM;
@@ -1362,7 +1382,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)

static struct pcie_port_service_driver aerdriver = {
.name = "aer",
- .port_type = PCI_EXP_TYPE_ROOT_PORT,
+ .port_type = PCIE_ANY_PORT,

Why this particular change? Seems that is a lot wider than simply
adding RCEC. Obviously we'll then drop out in the aer_probe but it
is still rather inelegant.

In order to extend the service drivers to non-root-port devices (i.e., RCEC), the simple path appeared to only require setting the type to ANY_PORT and catching the needed types arriving in the probe. Would you prefer extending to a type2? Iâm not sure how one is more elegant than another but open to that approach. However, this seems to require less code perhaps and seems consistent with most âdrop-outâ conditional patterns in the kernel. The same applies to pme.

Thanks,

Sean



.service = PCIE_PORT_SERVICE_AER,

.probe = aer_probe,