Re: [RFC PATCH 1/9] cxl/mem: Implement Get Event Records command

From: Ira Weiny
Date: Tue Aug 16 2022 - 19:35:31 EST


On Tue, Aug 16, 2022 at 12:39:58PM -0400, Steven Rostedt wrote:
> On Fri, 12 Aug 2022 22:32:35 -0700
> ira.weiny@xxxxxxxxx wrote:
>
> > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> >
> > Event records are defined for CXL devices. Each record is reported in
> > one event log. Devices are required to support the storage of at least
> > one event record in each event log type.
> >
> > Devices track event log overflow by incrementing a counter and tracking
> > the time of the first and last overflow event seen.
> >
> > Software queries events via the Get Event Record mailbox command; CXL
> > v3.0 section 8.2.9.2.2.
> >
> > Issue the Get Event Record mailbox command on driver load. Trace each
> > record found, as well as any overflow conditions. Only 1 event is
> > requested for each query. Optimization of multiple record queries is
> > deferred.
> >
> > This patch traces a raw event record only and leaves the specific event
> > record types to subsequent patches.
> >
> > NOTE: checkpatch is not completely happy with the tracing part of this
> > patch but AFAICT it is correct. I'm open to suggestions if I've done
> > something wrong.
>
> The include/trace/events/*.h files are all broken according to
> checkpatch.pl ;-) Don't worry about the formatting there. I need to update
> that script to detect that it's looking at TRACE_EVENT() that has different
> rules than normal macros.

Thanks!

[snip]

> > +
> > +/*
> > + * Now redefine the EM and EMe macros to map the enums to the strings that will
> > + * be printed in the output
> > + */
> > +#undef EM
> > +#undef EMe
> > +#define EM(a, b) {a, b},
> > +#define EMe(a, b) {a, b}
> > +
> > +TRACE_EVENT(cxl_event_overflow,
> > +
> > + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > + struct cxl_get_event_payload *payload),
> > +
> > + TP_ARGS(dev_name, log, payload),
> > +
> > + TP_STRUCT__entry(
> > + __string(dev_name, dev_name)
> > + __field(int, log)
> > + __field(u16, count)
> > + __field(u64, first)
> > + __field(u64, last)
>
> Because you have a dynamic string, you will save some bytes in the ring
> buffer if you have:
>
> __string(dev_name, dev_name)
> __field(int, log)
> __field(u64, first)
> __field(u64, last)
> __field(u16, count)

Thanks I missed this one. I was trying to pack things better but I missed this
one.

>
>
> > + ),
> > +
> > + TP_fast_assign(
> > + __assign_str(dev_name, dev_name);
> > + __entry->log = log;
> > + __entry->count = le16_to_cpu(payload->overflow_err_count);
> > + __entry->first = le64_to_cpu(payload->first_overflow_timestamp);
> > + __entry->last = le64_to_cpu(payload->last_overflow_timestamp);
> > + ),
> > +
> > + TP_printk("%s: EVENT LOG %s OVERFLOW %u records from %llu to %llu",
> > + __get_str(dev_name), show_log_type(__entry->log),
> > + __entry->count, __entry->first, __entry->last)
> > +
> > +);
> > +
> > +/*
> > + * Common Event Record Format
> > + * CXL v2.0 section 8.2.9.1.1; Table 153
> > + */
> > +#define CXL_EVENT_RECORD_FLAG_PERMANENT BIT(2)
> > +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED BIT(3)
> > +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
> > +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
> > +#define show_hdr_flags(flags) __print_flags(flags, " | ", \
> > + { CXL_EVENT_RECORD_FLAG_PERMANENT, "Permanent Condition" }, \
> > + { CXL_EVENT_RECORD_FLAG_MAINT_NEEDED, "Maintanance Needed" }, \
> > + { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "Performance Degraded" }, \
> > + { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "Hardware Replacement Needed" } \
> > +)
> > +
> > +TRACE_EVENT(cxl_event,
> > +
> > + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> > + struct cxl_event_record_raw *rec),
> > +
> > + TP_ARGS(dev_name, log, rec),
> > +
> > + TP_STRUCT__entry(
> > + __string(dev_name, dev_name)
> > + __field(int, log)
> > + __array(u8, id, UUID_SIZE)
> > + __field(u32, flags)
> > + __field(u16, handle)
> > + __field(u16, related_handle)
> > + __field(u64, timestamp)
> > + __array(u8, data, EVENT_RECORD_DATA_LENGTH)
> > + __field(u8, length)
>
> The above looks good.
>
> > + ),
> > +
> > + TP_fast_assign(
> > + __assign_str(dev_name, dev_name);
> > + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > + __entry->log = log;
> > + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > + __entry->length = le32_to_cpu(rec->hdr.flags_length) & 0xFF;
> > + __entry->handle = le16_to_cpu(rec->hdr.handle);
> > + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
>
> I wonder if I should add le64_to_cpu() and le32_to_cpu() to the functions
> that libtraceevent can parse, and then we could move that logic to the
> TP_printk(). That is, out of the fast path.

I would not do it for this series. I don't see performance on these traces
being an issue. The built in logging the trace mechanism provides (space
considerations) as well as user API to gain access to the data is what we are
leveraging more.

What I would really like (and was looking for the time to enhance) would be a
way to create 'sub-class' like events.

In this case each of the traces comes with a common header which is part of the
generic cxl_event trace point.

It would be nice if we could define something like:

DECLARE_EVENT_BASE_CLASS(event_header,

TP_PROTO(const char *dev_name, enum cxl_event_log_type log_type,
struct cxl_event_record_hdr hdr),
...

TP_base_printk(<print header fields>),
);

Then somehow use that header in the

TRACE_EVENT(cxl_event,

TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
struct cxl_event_record_raw *rec),

TP_SUB_CLASS(dev_name, log, (struct cxl_event_record_hdr *)rec),

TP_fast_assign(
call_base_assign(rec),
...
),

TP_printk(<automatically print header fields>
<print raw data>),
...
);

TRACE_EVENT(cxl_gen_media_event,

TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
struct cxl_evt_gen_media *rec),

TP_SUB_CLASS(dev_name, log, (struct cxl_event_record_hdr *)rec),

TP_fast_assign(
call_base_assign(rec),
...
),

TP_printk(<automatically print header fields>
<print media event fields>),
...
);

<etc>

Does that make sense? I've no idea how this could be done. But I think the
real work would be in the printk merging between the base class (header prints)
and the rest of the record. I _think_ that the fast assign could just call the
assign defined in the base class somehow.

Am I way off base thinking this is possible?

Thanks for the review!
Ira