Re: [PATCH] tracing: Add a way to filter function addresses to function names

From: Ross Zwisler
Date: Fri Dec 16 2022 - 16:39:01 EST


On Wed, Dec 14, 2022 at 12:52:09PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@xxxxxxxxxxx>
>
> There's been several times where an event records a function address in
> its field and I needed to filter on that address for a specific function
> name. It required looking up the function in kallsyms, finding its size,
> and doing a compare of "field >= function_start && field < function_end".

This is amazingly useful!

> But this would change from boot to boot and is unreliable in scripts.
> Also, it is useful to have this at boot up, where the addresses will not
> be known. For example, on the boot command line:
>
> trace_trigger="initcall_finish.traceoff if initcall_finish.function == acpi_init"

I think this should actually be:

trace_trigger="initcall_finish.traceoff if func.function == acpi_init"
^^^^

right? 'func' is the function pointer in the format:

[ /sys/kernel/debug/tracing/events/initcall/initcall_finish ]
# cat format
name: initcall_finish
ID: 20
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:initcall_t func; offset:8; size:8; signed:0;
field:int ret; offset:16; size:4; signed:1;

print fmt: "func=%pS ret=%d", REC->func, REC->ret

> To implement this, add a ".function" prefix, that will check that the
> field is of size long, and the only operations allowed (so far) are "=="
> and "!=".
>
> Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
> ---

<>

> @@ -1393,6 +1414,12 @@ static int parse_pred(const char *str, void *data,
> i += len;
> }
>
> + /* See if the field is a user space string */

Is this comment correct, or was it just copied from the .ustring handling
above? We aren't doing any string sanitization (strncpy_from_kernel_nofault()
and friends) AFAICT, just doing a kernel symbol lookup.

Maybe:

/* See if this is a kernel function name */

?

> + if ((len = str_has_prefix(str + i, ".function"))) {
> + function = true;
> + i += len;
> + }
> +
> while (isspace(str[i]))
> i++;
>
> @@ -1423,7 +1450,57 @@ static int parse_pred(const char *str, void *data,
> pred->offset = field->offset;
> pred->op = op;
>
> - if (ftrace_event_is_function(call)) {
> + if (function) {
> + /* The field must be the same size as long */
> + if (field->size != sizeof(long)) {
> + parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
> + goto err_free;
> + }
> +
> + /* Function only works with '==' or '!=' and an unquoted string */
> + switch (op) {
> + case OP_NE:
> + case OP_EQ:
> + break;
> + default:
> + parse_error(pe, FILT_ERR_INVALID_OP, pos + i);
> + goto err_free;
> + }
> +
> + if (isdigit(str[i])) {
> + ret = kstrtol(num_buf, 0, &ip);
> + if (ret) {
> + parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i);
> + goto err_free;
> + }

Maybe I'm holding it by the wrong end, but can we actually use this to filter
based on an address? Hex doesn't work (as you'd expect from looking at
kstrol()), but decimal doesn't work for me either:

[ /sys/kernel/debug/tracing/events/kmem/kmalloc ]
# echo "traceoff:1 if call_site.function == 0xffffffff96ca4240" > trigger

[ /sys/kernel/debug/tracing/events/kmem/kmalloc ]
# echo "traceoff:1 if call_site.function == 18446744071944421952" > trigger
bash: echo: write error: Invalid argument

Should this interface insist that users use function names that we can look
up?