Re: [PATCH RFC 0/3] Static calls

From: Ingo Molnar
Date: Fri Nov 09 2018 - 02:50:16 EST



* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> > - Does this feature have much value without retpolines? If not, should
> > we make it depend on retpolines somehow?
>
> Paravirt patching, as you mention in your later reply?

BTW., to look for candidates of this API, I'd suggest looking at the
function call frequency of my (almost-)distro kernel vmlinux:

$ objdump -d vmlinux | grep -w callq | cut -f3- | sort | uniq -c | sort -n | tail -100

which gives:

502 callq ffffffff8157d050 <nla_put>
522 callq ffffffff81aaf420 <down_write>
536 callq ffffffff81547e60 <_copy_to_user>
615 callq ffffffff81a97700 <snprintf>
624 callq *0xffffffff82648428
624 callq ffffffff810cc810 <__might_sleep>
625 callq ffffffff81a93b90 <strcmp>
649 callq ffffffff81547dd0 <_copy_from_user>
651 callq ffffffff811ba930 <trace_seq_printf>
654 callq ffffffff8170b6f0 <_dev_warn>
691 callq ffffffff81a93790 <strlen>
693 callq ffffffff81a88dc0 <cpumask_next>
709 callq *0xffffffff82648438
723 callq ffffffff811bdbd0 <trace_hardirqs_on>
735 callq ffffffff810feac0 <up_write>
750 callq ffffffff8163e9f0 <acpi_ut_status_exit>
768 callq *0xffffffff82648430
814 callq ffffffff81ab2710 <_raw_spin_lock_irq>
841 callq ffffffff81a9e680 <__memcpy>
863 callq ffffffff812ae3d0 <__kmalloc>
899 callq ffffffff8126ac80 <__might_fault>
912 callq ffffffff81ab2970 <_raw_spin_unlock_irq>
939 callq ffffffff81aaaf10 <_cond_resched>
966 callq ffffffff811bda00 <trace_hardirqs_off>
1069 callq ffffffff81126f50 <rcu_read_lock_sched_held>
1078 callq ffffffff81097760 <__warn_printk>
1081 callq ffffffff8157b140 <__dynamic_dev_dbg>
1351 callq ffffffff8170b630 <_dev_err>
1365 callq ffffffff811050c0 <lock_is_held_type>
1373 callq ffffffff81a977f0 <sprintf>
1390 callq ffffffff8157b090 <__dynamic_pr_debug>
1453 callq ffffffff8155c650 <__list_add_valid>
1501 callq ffffffff812ad6f0 <kmem_cache_alloc_trace>
1509 callq ffffffff8155c6c0 <__list_del_entry_valid>
1513 callq ffffffff81310ce0 <seq_printf>
1571 callq ffffffff81ab2780 <_raw_spin_lock_irqsave>
1624 callq ffffffff81ab29b0 <_raw_spin_unlock_irqrestore>
1661 callq ffffffff81126fd0 <rcu_read_lock_held>
1986 callq ffffffff81104940 <lock_acquire>
2050 callq ffffffff811c5110 <trace_define_field>
2133 callq ffffffff81102c70 <lock_release>
2507 callq ffffffff81ab2560 <_raw_spin_lock>
2676 callq ffffffff81aadc40 <mutex_lock_nested>
3056 callq ffffffff81ab2900 <_raw_spin_unlock>
3294 callq ffffffff81aac610 <mutex_unlock>
3628 callq ffffffff81129100 <rcu_is_watching>
4462 callq ffffffff812ac2c0 <kfree>
6454 callq ffffffff8111a51e <printk>
6676 callq ffffffff81101420 <lockdep_rcu_suspicious>
7328 callq ffffffff81e014b0 <__x86_indirect_thunk_rax>
7598 callq ffffffff81126f30 <debug_lockdep_rcu_enabled>
9065 callq ffffffff810979f0 <__stack_chk_fail>

The most prominent callers which are already function call pointers today
are:

$ objdump -d vmlinux | grep -w callq | grep \* | cut -f3- | sort | uniq -c | sort -n | tail -10

109 callq *0xffffffff82648530
134 callq *0xffffffff82648568
154 callq *0xffffffff826483d0
260 callq *0xffffffff826483d8
297 callq *0xffffffff826483e0
345 callq *0xffffffff82648440
345 callq *0xffffffff82648558
624 callq *0xffffffff82648428
709 callq *0xffffffff82648438
768 callq *0xffffffff82648430

That's all pv_ops->*() method calls:

ffffffff82648300 D pv_ops
ffffffff826485d0 D pv_info

Optimizing those thousands of function pointer calls would already be a
nice improvement.

But retpolines:

7328 callq ffffffff81e014b0 <__x86_indirect_thunk_rax>

ffffffff81e014b0 <__x86_indirect_thunk_rax>:
ffffffff81e014b0: ff e0 jmpq *%rax

... are even more prominent, and turned on in every distro as well,
obviously.

Thanks,

Ingo