Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)

From: Dmitry Safonov
Date: Thu Aug 25 2016 - 20:04:19 EST


2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@xxxxxxxxx>:
> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov <dsafonov@xxxxxxxxxxxxx> wrote:
>>This patches set is cleanly RFC and is not supposed to be applied.
>>Also for RFC time it builds only on x86_64.
>>
>>So, in a mail thread Oleg told that it would be worth to introduce
>>vm_file
>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>[1].
>>In this patches set I introduce in-kernel filesystem for vdso files.
>>After patches vDSO VMA now has inode and is just a private file
>>mapping:
>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>> [vvar]
>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>> [vdso]
>>
>>Then I introduce interface in uprobe_events to insert uprobes in vdso.
>>FWIW:
>> [~]# cd kernel/linux
>> [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>Symbol table '.dynsym' contains 11 entries:
>> Num: Value Size Type Bind Vis Ndx Name
>> 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
>> 1: 0000000000000470 0 SECTION LOCAL DEFAULT 8
>>2: 00000000000008d0 885 FUNC WEAK DEFAULT 12
>>clock_gettime@@LINUX_2.6
>>3: 0000000000000c50 472 FUNC GLOBAL DEFAULT 12
>>__vdso_gettimeofday@@LINUX_2.6
>>4: 0000000000000c50 472 FUNC WEAK DEFAULT 12
>>gettimeofday@@LINUX_2.6
>>5: 0000000000000e30 21 FUNC GLOBAL DEFAULT 12
>>__vdso_time@@LINUX_2.6
>> 6: 0000000000000e30 21 FUNC WEAK DEFAULT 12 time@@LINUX_2.6
>>7: 00000000000008d0 885 FUNC GLOBAL DEFAULT 12
>>__vdso_clock_gettime@@LINUX_2.6
>> 8: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6
>>9: 0000000000000e50 41 FUNC GLOBAL DEFAULT 12
>>__vdso_getcpu@@LINUX_2.6
>>10: 0000000000000e50 41 FUNC WEAK DEFAULT 12
>>getcpu@@LINUX_2.6
>> [~]# cd /sys/kernel/debug/tracing/
>> [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>> [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>> [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>> [tracing]# echo 1 > events/uprobes/enable
>> [tracing]# su test # it has UID=1001
>> [tracing]$ date
>> Thu Aug 25 17:19:29 MSK 2016
>> [tracing]$ exit
>> [tracing]# cat trace
>> # tracer: nop
>> #
>> # entries-in-buffer/entries-written: 175/175 #P:4
>> #
>> # _-----=> irqs-off
>> # / _----=> need-resched
>> # | / _---=> hardirq/softirq
>> # || / _--=> preempt-depth
>> # ||| / delay
>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>> # | | | |||| | |
>> bash-11560 [001] d... 316.470236: time: (0x7ffcacebae30)
>> bash-11560 [001] d... 316.471436: gettimeofday: (0x7ffcacebac50)
>> bash-11560 [001] d... 316.477550: time: (0x7ffcacebae30)
>> bash-11560 [001] d... 316.477655: time: (0x7ffcacebae30)
>> mktemp-11568 [001] d... 316.479589: gettimeofday: (0x7ffc603f0c50)
>> date-11571 [001] d... 316.481890: clock_gettime: (0x7ffec9db58d0)
>>[...]
>>
>>If this approach will be decided as fine, I will prepare a better
>>version,
>>fixing the following things:
>>o put vdsofs in generic fs/* dir
>>o support other archs and vdso blobs
>>o remove BUG_ON()'s and UID==1001 check
>>o remove extern's and use headers only
>>o refactor code in create_trace_uprobe()
>>o add some state to (struct trace_uprobe), so i.e., `cat uprobe_events`
>>will
>> print those uprobes as vdso-based
>>o document this interface in Documentation/trace/uprobetracer.txt
>>o prepare nice patches set?
>>
>>So, opinions? Is it worth to add something like this?
>>
>>[1]: https://lkml.org/lkml/2016/7/12/346
>>
>>Dmitry Safonov (3):
>> x86/vdso: create vdso file, use it for mapping
>> uprobe: drop isdigit() check in create_trace_uprobe
>> uprobe: add vdso support
>>
>>Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
>>Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
>>Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>>Cc: x86@xxxxxxxxxx
>>Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>
>>
>>arch/x86/entry/vdso/vma.c | 148
>>++++++++++++++++++++++++++++++++++++++++++--
>> kernel/trace/trace_uprobe.c | 50 +++++++++++----
>> 2 files changed, 180 insertions(+), 18 deletions(-)
>
> I think there is a lot to be said for this idea. However, a private mapping is definitely wrong for the vvar data; for the vdso code it could be considered either way I suppose.

Thanks on your reply.
As you could see, I preserved pure mapping of pfn for vvar:
7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0 [vvar]
7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18 [vdso]
(no inode number).
I also think it would be useless to do the same to vvar as it
has just data and there is no point in probing it.

--
Dmitry