Re: [PATCH v9 5/6] cxl/trace: Add an HPA to cxl_poison trace events

From: Jonathan Cameron
Date: Mon Mar 20 2023 - 12:37:30 EST


On Sun, 19 Mar 2023 21:31:50 -0700
alison.schofield@xxxxxxxxx wrote:

> From: Alison Schofield <alison.schofield@xxxxxxxxx>
>
> When a cxl_poison trace event is reported for a region, the poisoned
> Device Physical Address (DPA) can be translated to a Host Physical
> Address (HPA) for consumption by user space.
>
> Translate and add the resulting HPA to the cxl_poison trace event.
> Follow the device decode logic as defined in the CXL Spec 3.0 Section
> 8.2.4.19.13.
>
> If no region currently maps the poison, assign ULLONG_MAX to the
> cxl_poison event hpa field.
>
> Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> Reviewed-by: Ira Weiny <ira.weiny@xxxxxxxxx>

Hi Alison,

I poked this with a few more test cases in QEMU and ran into a corner that
probably wants addressing.

What should the tracepoints contain if the poisoned DPA length of a single
record returned by the device is greater than the interleave granularity of
of an interleaved region?

That didn't matter until HPA was added as we were just reporting a DPA
base and length, but with the HPA present, the length is only in DPA space
not HPA space. Userspace can figure this out, but that's rather inelegant
and would require ras-daemon or similar to go and query the interleave granularity
and ways.

I think the best thing to do in this case would be to break the single returned DPA
base record up into multiple trace points at the interleave granual boundaries.

What do you think we should do?

Jonathan

> ---
> drivers/cxl/core/trace.c | 94 ++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/trace.h | 9 +++-
> 2 files changed, 102 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/trace.c b/drivers/cxl/core/trace.c
> index 29ae7ce81dc5..d0403dc3c8ab 100644
> --- a/drivers/cxl/core/trace.c
> +++ b/drivers/cxl/core/trace.c
> @@ -1,5 +1,99 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2022 Intel Corporation. All rights reserved. */
>
> +#include <cxl.h>
> +#include "core.h"
> +
> #define CREATE_TRACE_POINTS
> #include "trace.h"
> +
> +static bool cxl_is_hpa_in_range(u64 hpa, struct cxl_region *cxlr, int pos)
> +{
> + struct cxl_region_params *p = &cxlr->params;
> + int gran = p->interleave_granularity;
> + int ways = p->interleave_ways;
> + u64 offset;
> +
> + /* Is the hpa within this region at all */
> + if (hpa < p->res->start || hpa > p->res->end) {
> + dev_dbg(&cxlr->dev,
> + "Addr trans fail: hpa 0x%llx not in region\n", hpa);
> + return false;
> + }
> +
> + /* Is the hpa in an expected chunk for its pos(-ition) */
> + offset = hpa - p->res->start;
> + offset = do_div(offset, gran * ways);
> + if ((offset >= pos * gran) && (offset < (pos + 1) * gran))
> + return true;
> +
> + dev_dbg(&cxlr->dev,
> + "Addr trans fail: hpa 0x%llx not in expected chunk\n", hpa);
> +
> + return false;
> +}
> +
> +static u64 cxl_dpa_to_hpa(u64 dpa, struct cxl_region *cxlr,
> + struct cxl_endpoint_decoder *cxled)
> +{
> + u64 dpa_offset, hpa_offset, bits_upper, mask_upper, hpa;
> + struct cxl_region_params *p = &cxlr->params;
> + int pos = cxled->pos;
> + u16 eig = 0;
> + u8 eiw = 0;
> +
> + ways_to_eiw(p->interleave_ways, &eiw);
> + granularity_to_eig(p->interleave_granularity, &eig);
> +
> + /*
> + * The device position in the region interleave set was removed
> + * from the offset at HPA->DPA translation. To reconstruct the
> + * HPA, place the 'pos' in the offset.
> + *
> + * The placement of 'pos' in the HPA is determined by interleave
> + * ways and granularity and is defined in the CXL Spec 3.0 Section
> + * 8.2.4.19.13 Implementation Note: Device Decode Logic
> + */
> +
> + /* Remove the dpa base */
> + dpa_offset = dpa - cxl_dpa_resource_start(cxled);
> +
> + mask_upper = GENMASK_ULL(51, eig + 8);
> +
> + if (eiw < 8) {
> + hpa_offset = (dpa_offset & mask_upper) << eiw;
> + hpa_offset |= pos << (eig + 8);
> + } else {
> + bits_upper = (dpa_offset & mask_upper) >> (eig + 8);
> + bits_upper = bits_upper * 3;
> + hpa_offset = ((bits_upper << (eiw - 8)) + pos) << (eig + 8);
> + }
> +
> + /* The lower bits remain unchanged */
> + hpa_offset |= dpa_offset & GENMASK_ULL(eig + 7, 0);
> +
> + /* Apply the hpa_offset to the region base address */
> + hpa = hpa_offset + p->res->start;
> +
> + if (!cxl_is_hpa_in_range(hpa, cxlr, cxled->pos))
> + return ULLONG_MAX;
> +
> + return hpa;
> +}
> +
> +u64 cxl_trace_hpa(struct cxl_region *cxlr, struct cxl_memdev *cxlmd,
> + u64 dpa)
> +{
> + struct cxl_region_params *p = &cxlr->params;
> + struct cxl_endpoint_decoder *cxled = NULL;
> +
> + for (int i = 0; i < p->nr_targets; i++) {
> + cxled = p->targets[i];
> + if (cxlmd == cxled_to_memdev(cxled))
> + break;
> + }
> + if (!cxled || cxlmd != cxled_to_memdev(cxled))
> + return ULLONG_MAX;
> +
> + return cxl_dpa_to_hpa(dpa, cxlr, cxled);
> +}
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index 33a22d26e742..25dbf52ac327 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -631,6 +631,8 @@ TRACE_EVENT(cxl_memory_module,
> #define cxl_poison_overflow(flags, time) \
> (flags & CXL_POISON_FLAG_OVERFLOW ? le64_to_cpu(time) : 0)
>
> +u64 cxl_trace_hpa(struct cxl_region *cxlr, struct cxl_memdev *memdev, u64 dpa);
> +
> TRACE_EVENT(cxl_poison,
>
> TP_PROTO(struct cxl_memdev *cxlmd, struct cxl_region *region,
> @@ -645,6 +647,7 @@ TRACE_EVENT(cxl_poison,
> __field(u64, serial)
> __string(region, region)
> __field(u64, overflow_t)
> + __field(u64, hpa)
> __field(u64, dpa)
> __field(u32, length)
> __array(char, uuid, 16)
> @@ -664,18 +667,22 @@ TRACE_EVENT(cxl_poison,
> if (region) {
> __assign_str(region, dev_name(&region->dev));
> memcpy(__entry->uuid, &region->params.uuid, 16);
> + __entry->hpa = cxl_trace_hpa(region, cxlmd,
> + __entry->dpa);
> } else {
> __assign_str(region, "");
> memset(__entry->uuid, 0, 16);
> + __entry->hpa = ULLONG_MAX;
> }
> ),
>
> - TP_printk("memdev=%s host=%s serial=%lld region=%s region_uuid=%pU dpa=0x%llx length=0x%x source=%s flags=%s overflow_time=%llu",
> + TP_printk("memdev=%s host=%s serial=%lld region=%s region_uuid=%pU hpa=0x%llx dpa=0x%llx length=0x%x source=%s flags=%s overflow_time=%llu",
> __get_str(memdev),
> __get_str(host),
> __entry->serial,
> __get_str(region),
> __entry->uuid,
> + __entry->hpa,
> __entry->dpa,
> __entry->length,
> show_poison_source(__entry->source),