Re: [PATCH 2/2] tracing: Add sample code for custom trace events

From: Joel Fernandes
Date: Wed Mar 02 2022 - 22:23:10 EST


Hi Steven,

On Tue, Mar 1, 2022 at 10:28 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> From: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>
>
> Add sample code to show how to create custom trace events in the tracefs
> directory that can be enabled and modified like any event in tracefs
> (including triggers, histograms, synthetic events and event probes).
>
> The example is creating a custom sched_switch and a sched_waking to limit
> what is recorded:
>
> If the custom sched switch only records the prev_prio, next_prio and
> next_pid, it can bring the size from 64 bytes per event, down to just 16
> bytes!
>
> If sched_waking only records the prio and pid of the woken event, it will
> bring the size down from 36 bytes to 12 bytes per event.
>
> This will allow for a much smaller footprint into the ring buffer and keep
> more events from dropping.
>
> Suggested-by: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>

Just 2 event fields related comments below, but other than that I
tested and it works quite well so:
Tested-By: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>

> ---
[..]
> diff --git a/samples/trace_events/trace_custom_sched.c b/samples/trace_events/trace_custom_sched.c
> new file mode 100644
> index 000000000000..5271a567d99b
> --- /dev/null
> +++ b/samples/trace_events/trace_custom_sched.c
> @@ -0,0 +1,280 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * event tracer
> + *
> + * Copyright (C) 2022 Google Inc, Steven Rostedt <rostedt@xxxxxxxxxxx>
> + */
> +
> +#define pr_fmt(fmt) fmt
> +
> +#include <linux/trace_events.h>
> +#include <linux/version.h>
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <trace/events/sched.h>
> +
> +#define THIS_SYSTEM "custom_sched"
> +
> +#define SCHED_PRINT_FMT \
> + C("prev_prio=%d next_pid=%d next_prio=%d", REC->prev_prio, REC->next_pid, \

Probably prev_pid should be included so we know what the previous task was?

Or are you expecting that a prior sched_switch would have that
information? If so, then prev_prio is also not needed as the previous
sched_switch's next_prio would have the prio. That would save even
more space too..

> + REC->next_prio)
> +
> +#define SCHED_WAKING_FMT \
> + C("pid=%d prio=%d", REC->pid, REC->prio)
> +

I think including the target_cpu of a wake up is also really important
to show where the task is going to be awakened, and maybe we can drop
prio since a subsequent sched_switch will have the priority in
next_prio.

[..]
> +static void __exit trace_sched_exit(void)
> +{
> + trace_set_clr_event(THIS_SYSTEM, "sched_switch", 0);
> + trace_set_clr_event(THIS_SYSTEM, "sched_waking", 0);
> +
> + trace_remove_event_call(&sched_switch_call);
> + trace_remove_event_call(&sched_waking_call);
> +}
> +
> +module_init(trace_sched_init);
> +module_exit(trace_sched_exit);
> +
> +MODULE_AUTHOR("Steven Rostedt");
> +MODULE_DESCRIPTION("Custom scheduling events");
> +MODULE_LICENSE("GPL");
> +

Remove extra lines from the end of the file?

Thanks,
Joel