Re: [PATCH v10 14/17] cxl/pci: Introduce CXL Endpoint protocol error handlers
From: Jonathan Cameron
Date: Fri Jun 27 2025 - 07:52:44 EST
On Thu, 26 Jun 2025 17:42:49 -0500
Terry Bowman <terry.bowman@xxxxxxx> wrote:
> CXL Endpoint protocol errors are currently handled using PCI error
> handlers. The CXL Endpoint requires CXL specific handling in the case of
> uncorrectable error (UCE) handling not provided by the PCI handlers.
>
> Add CXL specific handlers for CXL Endpoints. Rename the existing
> cxl_error_handlers to be pci_error_handlers to more correctly indicate
> the error type and follow naming consistency.
>
> The PCI handlers will be called if the CXL device is not trained for
> alternate protocol (CXL). Update the CXL Endpoint PCI handlers to call the
> CXL UCE handlers.
>
> The existing EP UCE handler includes checks for various results. These are
> no longer needed because CXL UCE recovery will not be attempted. Implement
> cxl_handle_ras() to return PCI_ERS_RESULT_NONE or PCI_ERS_RESULT_PANIC. The
> CXL UCE handler is called by cxl_do_recovery() that acts on the return
> value. In the case of the PCI handler path, call panic() if the result is
> PCI_ERS_RESULT_PANIC.
>
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
A few minor comments inline.
J
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 887b54cf3395..7209ffb5c2fe 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
>
> - scoped_guard(device, dev) {
> - if (!dev->driver) {
> +pci_ers_result_t cxl_error_detected(struct device *dev)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> + struct device *cxlmd_dev = &cxlds->cxlmd->dev;
> + pci_ers_result_t ue;
> +
> + scoped_guard(device, cxlmd_dev) {
I think there is nothing much happening after this (maybe introduced in later
patches in which case ignore this comment).
So can you just use a guard and reduce the indent of the rest?
> +
> + if (!cxlmd_dev->driver) {
> dev_warn(&pdev->dev,
> "%s: memdev disabled, abort error handling\n",
> dev_name(dev));
> - return PCI_ERS_RESULT_DISCONNECT;
> + return PCI_ERS_RESULT_PANIC;
> }
>
> if (cxlds->rcd)
> @@ -881,29 +888,23 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> ue = cxl_handle_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
little hard to tell from this code blob but can you return here?
> }
>
> -
> - switch (state) {
> - case pci_channel_io_normal:
> - if (ue) {
> - device_release_driver(dev);
> - return PCI_ERS_RESULT_NEED_RESET;
> - }
> - return PCI_ERS_RESULT_CAN_RECOVER;
> - case pci_channel_io_frozen:
> - dev_warn(&pdev->dev,
> - "%s: frozen state error detected, disable CXL.mem\n",
> - dev_name(dev));
> - device_release_driver(dev);
> - return PCI_ERS_RESULT_NEED_RESET;
> - case pci_channel_io_perm_failure:
> - dev_warn(&pdev->dev,
> - "failure state error detected, request disconnect\n");
> - return PCI_ERS_RESULT_DISCONNECT;
> - }
> - return PCI_ERS_RESULT_NEED_RESET;
> + return ue;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");