Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)

From: H. Peter Anvin
Date: Thu Aug 25 2016 - 16:51:10 EST


On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov <dsafonov@xxxxxxxxxxxxx> wrote:
>This patches set is cleanly RFC and is not supposed to be applied.
>Also for RFC time it builds only on x86_64.
>
>So, in a mail thread Oleg told that it would be worth to introduce
>vm_file
>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>[1].
>In this patches set I introduce in-kernel filesystem for vdso files.
>After patches vDSO VMA now has inode and is just a private file
>mapping:
>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
> [vvar]
>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
> [vdso]
>
>Then I introduce interface in uprobe_events to insert uprobes in vdso.
>FWIW:
> [~]# cd kernel/linux
> [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>Symbol table '.dynsym' contains 11 entries:
> Num: Value Size Type Bind Vis Ndx Name
> 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
> 1: 0000000000000470 0 SECTION LOCAL DEFAULT 8
>2: 00000000000008d0 885 FUNC WEAK DEFAULT 12
>clock_gettime@@LINUX_2.6
>3: 0000000000000c50 472 FUNC GLOBAL DEFAULT 12
>__vdso_gettimeofday@@LINUX_2.6
>4: 0000000000000c50 472 FUNC WEAK DEFAULT 12
>gettimeofday@@LINUX_2.6
>5: 0000000000000e30 21 FUNC GLOBAL DEFAULT 12
>__vdso_time@@LINUX_2.6
> 6: 0000000000000e30 21 FUNC WEAK DEFAULT 12 time@@LINUX_2.6
>7: 00000000000008d0 885 FUNC GLOBAL DEFAULT 12
>__vdso_clock_gettime@@LINUX_2.6
> 8: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6
>9: 0000000000000e50 41 FUNC GLOBAL DEFAULT 12
>__vdso_getcpu@@LINUX_2.6
>10: 0000000000000e50 41 FUNC WEAK DEFAULT 12
>getcpu@@LINUX_2.6
> [~]# cd /sys/kernel/debug/tracing/
> [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
> [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
> [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
> [tracing]# echo 1 > events/uprobes/enable
> [tracing]# su test # it has UID=1001
> [tracing]$ date
> Thu Aug 25 17:19:29 MSK 2016
> [tracing]$ exit
> [tracing]# cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 175/175 #P:4
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> bash-11560 [001] d... 316.470236: time: (0x7ffcacebae30)
> bash-11560 [001] d... 316.471436: gettimeofday: (0x7ffcacebac50)
> bash-11560 [001] d... 316.477550: time: (0x7ffcacebae30)
> bash-11560 [001] d... 316.477655: time: (0x7ffcacebae30)
> mktemp-11568 [001] d... 316.479589: gettimeofday: (0x7ffc603f0c50)
> date-11571 [001] d... 316.481890: clock_gettime: (0x7ffec9db58d0)
>[...]
>
>If this approach will be decided as fine, I will prepare a better
>version,
>fixing the following things:
>o put vdsofs in generic fs/* dir
>o support other archs and vdso blobs
>o remove BUG_ON()'s and UID==1001 check
>o remove extern's and use headers only
>o refactor code in create_trace_uprobe()
>o add some state to (struct trace_uprobe), so i.e., `cat uprobe_events`
>will
> print those uprobes as vdso-based
>o document this interface in Documentation/trace/uprobetracer.txt
>o prepare nice patches set?
>
>So, opinions? Is it worth to add something like this?
>
>[1]: https://lkml.org/lkml/2016/7/12/346
>
>Dmitry Safonov (3):
> x86/vdso: create vdso file, use it for mapping
> uprobe: drop isdigit() check in create_trace_uprobe
> uprobe: add vdso support
>
>Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
>Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
>Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>Cc: x86@xxxxxxxxxx
>Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>
>
>arch/x86/entry/vdso/vma.c | 148
>++++++++++++++++++++++++++++++++++++++++++--
> kernel/trace/trace_uprobe.c | 50 +++++++++++----
> 2 files changed, 180 insertions(+), 18 deletions(-)

I think there is a lot to be said for this idea. However, a private mapping is definitely wrong for the vvar data; for the vdso code it could be considered either way I suppose.
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.