Re: [PATCH v10 06/17] PCI/AER: Dequeue forwarded CXL error

From: Jonathan Cameron
Date: Thu Jul 03 2025 - 06:11:42 EST


On Wed, 2 Jul 2025 12:56:29 -0500
"Bowman, Terry" <terry.bowman@xxxxxxx> wrote:

> On 7/1/2025 6:04 PM, Dave Jiang wrote:
> >
> > On 6/26/25 3:42 PM, Terry Bowman wrote:
> >> The AER driver is now designed to forward CXL protocol errors to the CXL
> >> driver. Update the CXL driver with functionality to dequeue the forwarded
> >> CXL error from the kfifo. Also, update the CXL driver to begin the protocol
> >> error handling processing using the work received from the FIFO.
> >>
> >> Introduce function cxl_proto_err_work_fn() to dequeue work forwarded by the
> >> AER service driver. This will begin the CXL protocol error processing with
> >> a call to cxl_handle_proto_error().
> >>
> >> Update cxl/core/native_ras.c by adding cxl_rch_handle_error_iter() that was
> >> previously in the AER driver. Add check that Endpoint is bound to a CXL
> >> driver.
> >>
> >> Introduce logic to take the SBDF values from 'struct cxl_proto_error_info'
> >> and use in discovering the erring PCI device. The call to pci_get_domain_bus_and_slot()
> >> will return a reference counted 'struct pci_dev *'. This will serve as
> >> reference count to prevent releasing the CXL Endpoint's mapped RAS while
> >> handling the error. Use scope base __free() to put the reference count.
> >> This will change when adding support for CXL port devices in the future.
> >>
> >> Implement cxl_handle_proto_error() to differentiate between Restricted CXL
> >> Host (RCH) protocol errors and CXL virtual host (VH) protocol errors. RCH
> >> errors will be processed with a call to walk the associated Root Complex
> >> Event Collector's (RCEC) secondary bus looking for the Root Complex
> >> Integrated Endpoint (RCiEP) to handle the RCH error. Export pcie_walk_rcec()
> >> so the CXL driver can walk the RCEC's downstream bus, searching for the
> >> RCiEP.
> >>
> >> VH correctable error (CE) processing will call the CXL CE handler. VH
> >> uncorrectable errors (UCE) will call cxl_do_recovery(), implemented as a
> >> stub for now and to be updated in future patch. Export pci_aer_clean_fatal_status()
> >> and pci_clean_device_status() used to clean up AER status after handling.
> >>
> >> Maintain the locking logic found in the original AER driver. Replace the
> >> existing device_lock() in cxl_rch_handle_error_iter() to use guard(device)
> >> lock for maintainability. CE errors did not include locking in previous driver
> >> implementation. Leave the updated CE handling path as-is.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> > Couple minor comments below. Otherwise
> > Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> Thanks Dave.

Hi Terry,

Picking a random patch, another small request process wise.

If you agree with all suggestions in a review, don't reply to that email
just put your thanks and what changed in the change log of the next version.

Skipping that reply cuts down on the volume of emails that need scrolling
through and generally helps people focus on the emails that matter where there
is a question or similar.

This one gets a lot of contributors because it feels rude to not reply
but doing it via the next version is more efficient for everyone!

Jonathan

p.s. I only bother moaning about this to contributors who are sending
quite a bit of useful stuff!