Re:

From: Dan Williams
Date: Fri Jan 27 2023 - 14:17:17 EST


Alison Schofield wrote:
> On Thu, Jan 26, 2023 at 05:59:03PM -0800, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@xxxxxxxxx>
> > >
> > > Subject: [PATCH v5 0/5] CXL Poison List Retrieval & Tracing
> > >
> > > Changes in v5:
> > > - Rebase on cxl/next
> > > - Use struct_size() to calc mbox cmd payload .min_out
> > > - s/INTERNAL/INJECTED mocked poison record source
> > > - Added Jonathan Reviewed-by tag on Patch 3
> > >
> > > Link to v4:
> > > https://lore.kernel.org/linux-cxl/cover.1671135967.git.alison.schofield@xxxxxxxxx/
> > >
> > > Add support for retrieving device poison lists and store the returned
> > > error records as kernel trace events.
> > >
> > > The handling of the poison list is guided by the CXL 3.0 Specification
> > > Section 8.2.9.8.4.1. [1]
> > >
> > > Example, triggered by memdev:
> > > $ echo 1 > /sys/bus/cxl/devices/mem3/trigger_poison_list
> > > cxl_poison: memdev=mem3 pcidev=cxl_mem.3 region= region_uuid=00000000-0000-0000-0000-000000000000 dpa=0x0 length=0x40 source=Internal flags= overflow_time=0
> >
> > I think the pcidev= field wants to be called something like "host" or
> > "parent", because there is no strict requirement that a 'struct
> > cxl_memdev' is related to a 'struct pci_dev'. In fact in that example
> > "cxl_mem.3" is a 'struct platform_device'. Now that I think about it, I
> > think all CXL device events should be emitting the PCIe serial number
> > for the memdev.
> ]
>
> Will do, 'host' and add PCIe serial no.
>
> >
> > I will look in the implementation, but do region= and region_uuid= get
> > populated when mem3 is a member of the region?
>
> Not always.
> In the case above, where the trigger was by memdev, no.
> Region= and region_uuid= (and in the follow-on patch, hpa=) only get
> populated if the poison was triggered by region, like the case below.
>
> It could be looked up for the by memdev cases. Is that wanted?

Just trying to understand the semantics. However, I do think it makes sense
for a memdev trigger to lookup information on all impacted regions
across all of the device's DPA and the region trigger makes sense to
lookup all memdevs, but bounded by the DPA that contributes to that
region. I just want to avoid someone having to trigger the region to get
extra information that was readily available from a memdev listing.

>
> Thanks for the reviews Dan!
> >
> > >
> > > Example, triggered by region:
> > > $ echo 1 > /sys/bus/cxl/devices/region5/trigger_poison_list
> > > cxl_poison: memdev=mem0 pcidev=cxl_mem.0 region=region5 region_uuid=bfcb7a29-890e-4a41-8236-fe22221fc75c dpa=0x0 length=0x40 source=Internal flags= overflow_time=0
> > > cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region=region5 region_uuid=bfcb7a29-890e-4a41-8236-fe22221fc75c dpa=0x0 length=0x40 source=Internal flags= overflow_time=0
> > >
> > > [1]: https://www.computeexpresslink.org/download-the-specification
> > >
> > > Alison Schofield (5):
> > > cxl/mbox: Add GET_POISON_LIST mailbox command
> > > cxl/trace: Add TRACE support for CXL media-error records
> > > cxl/memdev: Add trigger_poison_list sysfs attribute
> > > cxl/region: Add trigger_poison_list sysfs attribute
> > > tools/testing/cxl: Mock support for Get Poison List
> > >
> > > Documentation/ABI/testing/sysfs-bus-cxl | 28 +++++++++
> > > drivers/cxl/core/mbox.c | 78 +++++++++++++++++++++++
> > > drivers/cxl/core/memdev.c | 45 ++++++++++++++
> > > drivers/cxl/core/region.c | 33 ++++++++++
> > > drivers/cxl/core/trace.h | 83 +++++++++++++++++++++++++
> > > drivers/cxl/cxlmem.h | 69 +++++++++++++++++++-
> > > drivers/cxl/pci.c | 4 ++
> > > tools/testing/cxl/test/mem.c | 42 +++++++++++++
> > > 8 files changed, 381 insertions(+), 1 deletion(-)
> > >
> > >
> > > base-commit: 589c3357370a596ef7c99c00baca8ac799fce531
> > > --
> > > 2.37.3
> > >
> >
> >