Re: [PATCH 2/5] cxl/pci: Discover and cache pointer to RCD dport's PCIe AER capability

From: Terry Bowman
Date: Tue Oct 25 2022 - 12:42:48 EST




On 10/22/22 16:45, Dan Williams wrote:
> Terry Bowman wrote:
>> CXL downport PCIe AER information needs to be logged during error handling.
>> The RCD downport/upport does not have a BDF and is not PCI enumerable. As a
>> result the CXL PCIe driver is not aware of the AER in 'PCI Express'
>> capability located in the RCRB downport/upport. Logic must be introduced to
>> use the downport/upport AER information.
>
> Nice, I am happy to see this work get started.
>
>>
>> Update the CXL driver to find the downport's PCIe AER capability and cache
>> a pointer for using later. First, find the RCRB to provide the
>> downport/upport memory region. The downport/upport are mapped as MMIO not
>> PCI config space. Use readl/writel/readq/writeq as required by the CXL spec
>> to find and operate on the AER registers.[1]
>>
>> Also, add function to detect if the device is a CXL1.1 RCD device.
>>
>> [1] CXL3.0, 8.2 'Memory Mapped Registers'
>>
>> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
>> ---
>> drivers/cxl/acpi.c | 56 ++++++++++++++
>> drivers/cxl/core/regs.c | 1 +
>> drivers/cxl/cxl.h | 9 +++
>> drivers/cxl/cxlmem.h | 2 +
>> drivers/cxl/mem.c | 2 +
>> drivers/cxl/pci.c | 158 ++++++++++++++++++++++++++++++++++++++++
>> 6 files changed, 228 insertions(+)
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index bf251a27e460..5d543c789e8d 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -232,6 +232,7 @@ struct cxl_chbs_context {
>> struct device *dev;
>> unsigned long long uid;
>> struct acpi_cedt_chbs chbs;
>> + resource_size_t chbcr;
>> };
>>
>> static int cxl_get_chbs(union acpi_subtable_headers *header, void *arg,
>> @@ -417,6 +418,61 @@ static void remove_cxl_resources(void *data)
>> }
>> }
>>
>> +static const struct acpi_device_id cxl_host_ids[] = {
>> + { "ACPI0016", 0 },
>> + { "PNP0A08", 0 },
>> + { },
>> +};
>> +
>> +static int __cxl_get_rcrb(union acpi_subtable_headers *header, void *arg,
>> + const unsigned long end)
>> +{
>> + struct cxl_chbs_context *ctx = arg;
>> + struct acpi_cedt_chbs *chbs;
>> +
>> + if (ctx->chbcr)
>> + return 0;
>> +
>> + chbs = (struct acpi_cedt_chbs *)header;
>> +
>> + if (ctx->uid != chbs->uid)
>> + return 0;
>> +
>> + if (chbs->cxl_version != ACPI_CEDT_CHBS_VERSION_CXL11)
>> + return 0;
>> +
>> + if (chbs->length != SZ_8K)
>> + return 0;
>> +
>> + ctx->chbcr = chbs->base;
>> +
>> + return 0;
>> +}
>
> This seems redundant with component register enumeration that was
> already added in Robert's patches.
>

Noted. Plese see my next response below.

>> +
>> +resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd)
>> +{
>> + struct pci_host_bridge *host = NULL;
>> + struct cxl_chbs_context ctx = {0};
>> + struct cxl_dport *dport;
>> + struct cxl_port *port;
>> +
>> + port = cxl_mem_find_port(cxlmd, NULL);
>> + if (!port)
>> + return 0;
>> +
>> + dport = port->parent_dport;
>> + ctx.uid = dport ? dport->port_id : 0;
>> + if (!dport)
>> + return 0;
>> +
>> + acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, __cxl_get_rcrb, &ctx);
>> +
>> + dev_dbg(&host->dev, "RCRB found: 0x%08llx\n", (u64)ctx.chbcr);
>> +
>> + return ctx.chbcr;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_rcrb, CXL);
>
> While CXL to date has been tied ACPI platforms there is no requirement
> that CXL be ACPI based. Given that other coherent bus specifications
> that were deployed on PowerPC have now joined the CXL consortium there
> is an increasing chance of CXL appearing on an Open Firmware based
> platforms. Even if that does not come to pass, the discipline of clear
> separation between platform code and PCI/CXL mechanisms leads to cleaner
> code organization.
>
> All that to say, no, cxl_acpi cannot export functions for other cxl
> modules to depend upon. Instead it needs to publish these details in the
> CXL objects that it registers.
>

Ok. Ill rework such that ACPI functions are not exported. Adding the recommended
caching 'rcrb_phys' will help recfactoring out the exported function. I'll
cache the RCRB to 'rcrb_phys' during enumeration instead of calling a helper
function for every query.

> In this case my expectation is that cxl_acpi registers a dport decorated
> with the extra RCH information. Something like:
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f680450f0b16..b42f4759743b 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -499,12 +499,19 @@ cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev)
> * @port_id: unique hardware identifier for dport in decoder target list
> * @component_reg_phys: downstream port component registers
> * @port: reference to cxl_port that contains this downstream port
> + * @is_rch: enable RCH vs VH enumeration (see CXL 3.0 9.11.8)
> */
> struct cxl_dport {
> struct device *dport;
> int port_id;
> resource_size_t component_reg_phys;
> struct cxl_port *port;
> + bool is_rch;
> +};
> +
> +struct cxl_rch_dport {
> + struct cxl_dport dport;
> + resource_size_t rcrb_phys;
> };
>

The same is needed for uport as well, correct ?

> /**
>
> Then, when cxl_mem notices that the memdev is being produced by an
> RCIEP, it can skip devm_cxl_enumerate_ports() and jump straight to
> cxl_mem_find_port(). That will return this dport with the rcrb base
> where cxl_mem can arrange the AER handling. Likely we will need some
> notification mechanism to route Root Complex AER events to cxl_acpi,
> cxl_pci, and/or cxl_mem to optionally add the CXL RAS data to the log.
>
Isn't the notification mechanism through the AER interrupt processing?
I will have more related comments in patch 3/5.

>> +
>> /**
>> * add_cxl_resources() - reflect CXL fixed memory windows in iomem_resource
>> * @cxl_res: A standalone resource tree where each CXL window is a sibling
>> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
>> index ec178e69b18f..0d4f633e5c01 100644
>> --- a/drivers/cxl/core/regs.c
>> +++ b/drivers/cxl/core/regs.c
>> @@ -184,6 +184,7 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>>
>> return ret_val;
>> }
>> +EXPORT_SYMBOL_NS_GPL(devm_cxl_iomap_block, CXL);
>>
>> int cxl_map_component_regs(struct pci_dev *pdev,
>> struct cxl_component_regs *regs,
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index ac8998b627b5..7d507ab80a78 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -204,6 +204,14 @@ struct cxl_register_map {
>> };
>> };
>>
>> +struct cxl_memdev;
>> +int cxl_pci_probe_dport(struct cxl_memdev *cxlmd);
>> +
>> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd);
>> +
>> +void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>> + resource_size_t length);
>> +
>> void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>> struct cxl_component_reg_map *map);
>> void cxl_probe_device_regs(struct device *dev, void __iomem *base,
>> @@ -549,6 +557,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
>> return port->uport == port->dev.parent;
>> }
>>
>> +resource_size_t cxl_get_rcrb(struct cxl_memdev *cxlmd);
>> bool is_cxl_port(struct device *dev);
>> struct cxl_port *to_cxl_port(struct device *dev);
>> struct pci_bus;
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index 88e3a8e54b6a..079db2e15acc 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -242,6 +242,8 @@ struct cxl_dev_state {
>> u64 next_volatile_bytes;
>> u64 next_persistent_bytes;
>>
>> + struct cxl_register_map aer_map;
>> +
>> resource_size_t component_reg_phys;
>> u64 serial;
>>
>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
>> index 64ccf053d32c..d1e663be43c2 100644
>> --- a/drivers/cxl/mem.c
>> +++ b/drivers/cxl/mem.c
>> @@ -74,6 +74,8 @@ static int cxl_mem_probe(struct device *dev)
>> if (rc)
>> return rc;
>>
>> + cxl_pci_aer_init(cxlmd);
>> +
>> parent_port = cxl_mem_find_port(cxlmd, &dport);
>> if (!parent_port) {
>> dev_err(dev, "CXL port topology not found\n");
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index faeb5d9d7a7a..2287b5225862 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -35,6 +35,15 @@
>> (readl((cxlds)->regs.mbox + CXLDEV_MBOX_CTRL_OFFSET) & \
>> CXLDEV_MBOX_CTRL_DOORBELL)
>>
>> +/* PCI 5.0 - 7.8.4 Advanced Error Reporting Extended Capability */
>> +#define PCI_AER_CAP_SIZE 0x48
>> +
>> +/* CXL 3.0 - 8.2.1.3.3, Offset to DVSEC Port Status */
>> +#define CXL_DVSEC_PORT_STATUS_OFF 0xE
>> +
>> +/* CXL 3.0 - 8.2.1.3.3 */
>> +#define CXL_DVSEC_VH_SUPPORT 0x20
>> +
>> /* CXL 2.0 - 8.2.8.4 */
>> #define CXL_MAILBOX_TIMEOUT_MS (2 * HZ)
>>
>> @@ -428,6 +437,155 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
>> }
>> }
>>
>> +static resource_size_t cxl_get_dport_ext_cap(struct cxl_memdev *cxlmd, u32 cap_id)
>> +{
>> + resource_size_t rcrb, offset;
>> + void *rcrb_mapped;
>> + u32 cap_hdr;
>> +
>> + rcrb = cxl_get_rcrb(cxlmd);
>> + if (!rcrb)
>> + return 0;
>> +
>> + rcrb_mapped = ioremap(rcrb, SZ_4K);
>> + if (!rcrb_mapped)
>> + return 0;
>> +
>> + offset = PCI_CFG_SPACE_SIZE;
>> + cap_hdr = readl(rcrb_mapped + offset);
>> +
>> + while (PCI_EXT_CAP_ID(cap_hdr)) {
>> + if (PCI_EXT_CAP_ID(cap_hdr) == cap_id)
>> + break;
>> +
>> + offset = PCI_EXT_CAP_NEXT(cap_hdr);
>> + if (offset == 0)
>> + break;
>> +
>> + cap_hdr = readl(rcrb_mapped + offset);
>> + }
>> + iounmap((void *)rcrb_mapped);
>
> The memdev owns the upstream port RAS capability, why is it mapping the
> downstream port ras capability?
>

You're right, this needs to be changed to read the upport's extended
capabilities.

>> +
>> + if (PCI_EXT_CAP_ID(cap_hdr) != cap_id)
>> + return 0;
>> +
>> + pr_debug("Found capability %X @ %llX (%X)\n",
>> + cap_id, rcrb + offset, cap_hdr);
>> +
>> + return rcrb + offset;
>> +}
>> +
>> +bool is_rcd(struct cxl_memdev *cxlmd)
>> +{
>> + struct pci_dev *pdev;
>> + resource_size_t dvsec;
>> + void *dvsec_mapped;
>> + u32 dvsec_data;
>> +
>> + if (!dev_is_pci(cxlmd->cxlds->dev))
>> + return false;
>
> Just use cxlmd->dev.parent, no need to route through cxlds for this.
>

Ok

>> +
>> + pdev = to_pci_dev(cxlmd->cxlds->dev);
>> +
>> + /*
>> + * 'CXL devices operating in this mode always set the Device/Port
>> + * Type field in the PCI Express Capabilities register to RCiEP.'
>> + * - CXL3.0 9.11.1 'RCD Mode'
>> + */
>> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_END)
>> + return false;
>> +
>> + /*
>> + * Check if VH is enabled
>> + * - CXL3.0 8.2.1.3.1 'DVSEC Flex Bus Port Capability'
>> + */
>> + dvsec = cxl_get_dport_ext_cap(cxlmd, PCI_EXT_CAP_ID_DVSEC);
>> + if (!dvsec)
>> + return false;
>> +
>> + dvsec_mapped = ioremap(dvsec, SZ_4K);
>
> No ioremap error handling?
>

I'll add.

>> + dvsec_data = readl(dvsec_mapped + CXL_DVSEC_PORT_STATUS_OFF);
>> + iounmap(dvsec_mapped);
>> + if (dvsec_data & CXL_DVSEC_VH_SUPPORT)
>> + return false;
>
> There's no such thing as a root-complex-integrated endpoint in CXL VH
> mode, right? Is this not redundant?
>

Youre correct.

'CXL Root Ports may be directly connected to a CXL device that is not an eRCD, or a CXL
Switch. These Root Ports spawn a CXL Virtual Hierarchy (VH). Enumeration within a
CXL VH is described below. These CXL devices appear as a standard PCIe Endpoints with a Type 0 Header.'
-cxl3.0 9.12.2 'CXL Virtual Hierarchy'

I will remove the DVSEC check.

>> +
>> + return true;
>> +}
>> +
>> +#define PCI_CAP_ID(header) (header & 0x000000ff)
>> +#define PCI_CAP_NEXT(header) ((header >> 8) & 0xff)
>> +
>> +static resource_size_t cxl_get_dport_cap(struct cxl_memdev *cxlmd, int cap_id)
>> +{
>> + resource_size_t offset, rcrb;
>> + void *rcrb_mapped;
>> + u32 cap_hdr;
>> +
>> + rcrb = cxl_get_rcrb(cxlmd);
>> + if (!rcrb)
>> + return 0;
>> +
>> + rcrb_mapped = ioremap(rcrb, SZ_4K);
>> + if (!rcrb_mapped)
>> + return 0;
>> +
>> + offset = readl(rcrb_mapped + PCI_CAPABILITY_LIST);
>> + cap_hdr = readl(rcrb_mapped + offset);
>> +
>> + while (PCI_CAP_ID(cap_hdr)) {
>> + if (PCI_CAP_ID(cap_hdr) == cap_id)
>> + break;
>> +
>> + offset = PCI_CAP_NEXT(cap_hdr);
>> + if (offset == 0)
>> + break;
>> +
>> + cap_hdr = readl(rcrb_mapped + offset);
>> + }
>> + iounmap((void *)rcrb_mapped);
>
> All of this mapping and unmapping of the RCRB needs to be centralized in
> one place and owned by one driver for the downstream portion and one
> driver for the upstream portion. That *feels* like cxl_acpi for the
> downstream side and cxl_port for the upstream side (when it drives the
> endpoint port registered by cxl_mem). It should also be protected by
> request_region() to make sure multiple agents are not stepping on each
> other's toes.
>

Ok. I'll look into this and make the change.

Thank you for this review.

Regards,
Terry Bowman

>> +
>> + if (PCI_CAP_ID(cap_hdr) != cap_id)
>> + return 0;
>> +
>> + pr_debug("Found capability %X @ %llX (%X)\n",
>> + cap_id, rcrb + offset, cap_hdr);
>> +
>> + return rcrb + offset;
>> +}
>> +
>> +static int cxl_setup_dport_aer(struct cxl_memdev *cxlmd, resource_size_t cap_base)
>> +{
>> + struct cxl_register_map *map = &cxlmd->cxlds->aer_map;
>> + struct pci_dev *pdev = to_pci_dev(&cxlmd->dev);
>> +
>> + if (!cap_base)
>> + return -ENODEV;
>> +
>> + map->base = devm_cxl_iomap_block(&pdev->dev, cap_base,
>> + PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1);
>> + if (!map->base)
>> + return -ENOMEM;
>> +
>> + return 0;
>> +}
>> +
>> +void cxl_pci_aer_init(struct cxl_memdev *cxlmd)
>> +{
>> + resource_size_t cap_base;
>> +
>> + /* CXL2.0 is enumerable and will use AER attached to `struct pci_dev` */
>> + if (!is_rcd(cxlmd))
>> + return;
>> +
>> + /*
>> + * Read base address of the PCI express cap. Cache the cap's
>> + * PCI_EXP_DEVCTL and PCI_EXP_DEVSTA for AER control and status.
>> + */
>> + cap_base = cxl_get_dport_cap(cxlmd, PCI_CAP_ID_EXP);
>> + cxl_setup_dport_aer(cxlmd, cap_base);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_pci_aer_init, CXL);
>> +
>> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> {
>> struct cxl_register_map map;
>> --
>> 2.34.1
>>
>
>