Re: [RFC PATCH v6 4/6] tracing/probe: Support user-space dereference

From: Masami Hiramatsu
Date: Wed May 08 2019 - 00:12:49 EST


On Mon, 6 May 2019 11:52:26 -0400
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Mon, 18 Mar 2019 15:43:52 +0900
> Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> > +.. _user_mem_access:
> > +User Memory Access
> > +------------------
> > +Kprobe events supports user-space memory access. For that purpose, you can use
> > +either user-space dereference syntax or 'ustring' type.
> > +
> > +The user-space dereference syntax allows you to access a field of a data
> > +structure in user-space. This is done by adding the "u" prefix to the
> > +dereference syntax. For example, +u4(%si) means it will read memory from the
> > +address in the register %si offset by 4, and the mory is expected to be in
>
> ^^^^
> "memory"

OK, thanks!

>
> > +user-space. You can use this for strings too, e.g. +u0(%si):string will read
> > +a string from the address in the register %si that is expected to be in user-
> > +space. 'ustring' is a shortcut way of performing the same task. That is,
> > ++0(%si):ustring is equivalent to +u0(%si):string.
> > +
> > +Note that kprobe-event provides the user-memory access syntax but it doesn't
> > +use it transparently. This means if you use normal dereference or string type
> > +for user memory, it might fail, and always fails on some arch. So user has to
>
> "and may always fail on some archs. The user has to carefully check
> if the target data is in kernel or user space."

OK. I'll update.

> > +check if the targe data is in kernel or in user space carefully.
> >
> > Per-Probe Event Filtering
> > -------------------------
> > diff --git a/Documentation/trace/uprobetracer.rst b/Documentation/trace/uprobetracer.rst
> > index 4346e23e3ae7..de8812c932bc 100644
> > --- a/Documentation/trace/uprobetracer.rst
> > +++ b/Documentation/trace/uprobetracer.rst
> > @@ -42,16 +42,17 @@ Synopsis of uprobe_tracer
> > @+OFFSET : Fetch memory at OFFSET (OFFSET from same file as PATH)
> > $stackN : Fetch Nth entry of stack (N >= 0)
> > $stack : Fetch stack address.
> > - $retval : Fetch return value.(*)
> > + $retval : Fetch return value.(\*1)
> > $comm : Fetch current task comm.
> > - +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
> > + +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*2)(\*3)
> > NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
> > FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
> > (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
> > (x8/x16/x32/x64), "string" and bitfield are supported.
>
> Hmm, shouldn't uprobes default to userspace. Isn't the purpose mostly
> to find out what's going on in userspace. Perhaps we should add a 'k'
> annotation to uprobes to denote that it's for kernel space, as that
> should be the exception and not the norm.

No, uprobe can not access kernel space, because it doesn't have the
current kernel context. Note that all registers, stacks which
can be accessed from uprobe handler are user-space. We can not access
kernel context from that. See below

> > - (*) only for return probe.
> > - (**) this is useful for fetching a field of data structures.
> > + (\*1) only for return probe.
> > + (\*2) this is useful for fetching a field of data structures.
> > + (\*3) Unlike kprobe event, "u" prefix will just be ignored.

Thus the 'u' is just ignored on uprobe event.

> >
> > Types
> > -----
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 7a6ed76ba104..b595d5ef099a 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -4815,10 +4815,11 @@ static const char readme_msg[] =
> > "\t args: <name>=fetcharg[:type]\n"
> > "\t fetcharg: %<register>, @<address>, @<symbol>[+|-<offset>],\n"
> > #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
> > - "\t $stack<index>, $stack, $retval, $comm, $arg<N>\n"
> > + "\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
> > #else
> > - "\t $stack<index>, $stack, $retval, $comm\n"
> > + "\t $stack<index>, $stack, $retval, $comm,\n"
> > #endif
> > + "\t +|-[u]<offset>(<fetcharg>)\n"
> > "\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
> > "\t b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
> > "\t <type>\\[<array-size>\\]\n"
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index e346229ddbba..9456f4ca3b8a 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -930,6 +930,12 @@ probe_mem_read(void *dest, void *src, size_t size)
> > return probe_kernel_read(dest, src, size);
> > }
> >
> > +static nokprobe_inline int
> > +probe_mem_read_user(void *dest, void *src, size_t size)
> > +{
> > + return probe_user_read(dest, src, size);
> > +}
> > +
> > /* Note that we don't verify it, since the code does not come from user space */
> > static int
> > process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
> > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> > index 30054136cfde..00771e7b6ef8 100644
> > --- a/kernel/trace/trace_probe.c
> > +++ b/kernel/trace/trace_probe.c
> > @@ -253,6 +253,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> > {
> > struct fetch_insn *code = *pcode;
> > unsigned long param;
> > + int deref = FETCH_OP_DEREF;
> > long offset = 0;
> > char *tmp;
> > int ret = 0;
> > @@ -315,9 +316,14 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> > break;
> >
> > case '+': /* deref memory */
> > - arg++; /* Skip '+', because kstrtol() rejects it. */
> > - /* fall through */
> > case '-':
> > + if (arg[1] == 'u') {
> > + deref = FETCH_OP_UDEREF;
> > + arg[1] = arg[0];
> > + arg++;
> > + }
>
> It should be fine to add a 'k' version here too.
>
> > + if (arg[0] == '+')
> > + arg++; /* Skip '+', because kstrtol() rejects it. */
> > tmp = strchr(arg, '(');
> > if (!tmp)
> > return -EINVAL;
> > @@ -343,7 +349,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> > return -E2BIG;
> > *pcode = code;
> >
> > - code->op = FETCH_OP_DEREF;
> > + code->op = deref;
> > code->offset = offset;
> > }
> > break;
> > @@ -459,13 +465,14 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
> > /* Store operation */
> > if (!strcmp(parg->type->name, "string") ||
> > !strcmp(parg->type->name, "ustring")) {
> > - if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_IMM &&
> > - code->op != FETCH_OP_COMM) {
> > + if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_UDEREF
> > + && code->op != FETCH_OP_IMM && code->op != FETCH_OP_COMM) {
> > pr_info("string only accepts memory or address.\n");
> > ret = -EINVAL;
> > goto fail;
> > }
> > - if (code->op != FETCH_OP_DEREF || parg->count) {
> > + if ((code->op == FETCH_OP_IMM || code->op == FETCH_OP_COMM)
> > + || parg->count) {
> > /*
> > * IMM and COMM is pointing actual address, those must
> > * be kept, and if parg->count != 0, this is an array
> > @@ -478,7 +485,8 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
> > }
> > }
> > /* If op == DEREF, replace it with STRING */
> > - if (!strcmp(parg->type->name, "ustring"))
> > + if (!strcmp(parg->type->name, "ustring") ||
>
> Perhaps have a "kstring" for kernel strings in uprobes.
>
> > + code->op == FETCH_OP_UDEREF)
> > code->op = FETCH_OP_ST_USTRING;
> > else
> > code->op = FETCH_OP_ST_STRING;
> > @@ -487,6 +495,9 @@ static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
> > } else if (code->op == FETCH_OP_DEREF) {
> > code->op = FETCH_OP_ST_MEM;
> > code->size = parg->type->size;
> > + } else if (code->op == FETCH_OP_UDEREF) {
> > + code->op = FETCH_OP_ST_UMEM;
> > + code->size = parg->type->size;
> > } else {
> > code++;
> > if (code->op != FETCH_OP_NOP) {
> > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > index 94cdcfdaced0..0feac0a81f82 100644
> > --- a/kernel/trace/trace_probe.h
> > +++ b/kernel/trace/trace_probe.h
> > @@ -92,9 +92,11 @@ enum fetch_op {
> > FETCH_OP_FOFFS, /* File offset: .immediate */
> > // Stage 2 (dereference) op
> > FETCH_OP_DEREF, /* Dereference: .offset */
> > + FETCH_OP_UDEREF, /* User-space Dereference: .offset */
> > // Stage 3 (store) ops
> > FETCH_OP_ST_RAW, /* Raw: .size */
> > FETCH_OP_ST_MEM, /* Mem: .offset, .size */
> > + FETCH_OP_ST_UMEM, /* Mem: .offset, .size */
> > FETCH_OP_ST_STRING, /* String: .offset, .size */
> > FETCH_OP_ST_USTRING, /* User String: .offset, .size */
> > // Stage 4 (modify) op
> > diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
> > index 7526f6f8d7b0..06f2d901c4cf 100644
> > --- a/kernel/trace/trace_probe_tmpl.h
> > +++ b/kernel/trace/trace_probe_tmpl.h
> > @@ -64,6 +64,8 @@ static nokprobe_inline int
> > fetch_store_string_user(unsigned long addr, void *dest, void *base);
> > static nokprobe_inline int
> > probe_mem_read(void *dest, void *src, size_t size);
> > +static nokprobe_inline int
> > +probe_mem_read_user(void *dest, void *src, size_t size);
> >
> > /* From the 2nd stage, routine is same */
> > static nokprobe_inline int
> > @@ -77,14 +79,21 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
> >
> > stage2:
> > /* 2nd stage: dereference memory if needed */
> > - while (code->op == FETCH_OP_DEREF) {
> > - lval = val;
> > - ret = probe_mem_read(&val, (void *)val + code->offset,
> > - sizeof(val));
> > + do {
> > + if (code->op == FETCH_OP_DEREF) {
> > + lval = val;
> > + ret = probe_mem_read(&val, (void *)val + code->offset,
> > + sizeof(val));
> > + } else if (code->op == FETCH_OP_UDEREF) {
> > + lval = val;
> > + ret = probe_mem_read_user(&val,
> > + (void *)val + code->offset, sizeof(val));
> > + } else
> > + break;
> > if (ret)
> > return ret;
> > code++;
> > - }
> > + } while (1);
> >
> > s3 = code;
> > stage3:
> > @@ -109,6 +118,10 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
> > case FETCH_OP_ST_MEM:
> > probe_mem_read(dest, (void *)val + code->offset, code->size);
> > break;
> > + case FETCH_OP_ST_UMEM:
> > + probe_mem_read_user(dest, (void *)val + code->offset,
> > + code->size);
> > + break;
> > case FETCH_OP_ST_STRING:
> > loc = *(u32 *)dest;
> > ret = fetch_store_string(val + code->offset, dest, base);
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index f4e37c4f8a21..5bc8c3686f6f 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -140,6 +140,13 @@ probe_mem_read(void *dest, void *src, size_t size)
> >
> > return copy_from_user(dest, vaddr, size) ? -EFAULT : 0;
> > }
> > +
> > +static nokprobe_inline int
> > +probe_mem_read_user(void *dest, void *src, size_t size)
> > +{
> > + return probe_mem_read(dest, src, size);
>
> Hmm, if probe_mem_read() is the same as probe_mem_read_user(), perhaps
> not even have a 'u' version for uprobes.

Yes, this is just a hack for sharing the template code.

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>