Re: [PATCH v2 bpf-next 1/3] perf: enable branch record for software events

From: Peter Zijlstra
Date: Mon Aug 30 2021 - 12:07:52 EST


On Mon, Aug 30, 2021 at 03:25:44PM +0000, Song Liu wrote:
> Thanks for these information! I did get confused these macros for quite a
> while. Let me try with the _RET0 version.

Does you kernel have:

9ae6ab27f44e ("static_call: Update API documentation")

?

With that included, the comment at the top of static_call.h reads like
the below. Please let me know where you think this can be improved.


/*
* Static call support
*
* Static calls use code patching to hard-code function pointers into direct
* branch instructions. They give the flexibility of function pointers, but
* with improved performance. This is especially important for cases where
* retpolines would otherwise be used, as retpolines can significantly impact
* performance.
*
*
* API overview:
*
* DECLARE_STATIC_CALL(name, func);
* DEFINE_STATIC_CALL(name, func);
* DEFINE_STATIC_CALL_NULL(name, typename);
* DEFINE_STATIC_CALL_RET0(name, typename);
*
* __static_call_return0;
*
* static_call(name)(args...);
* static_call_cond(name)(args...);
* static_call_update(name, func);
* static_call_query(name);
*
* EXPORT_STATIC_CALL{,_TRAMP}{,_GPL}()
*
* Usage example:
*
* # Start with the following functions (with identical prototypes):
* int func_a(int arg1, int arg2);
* int func_b(int arg1, int arg2);
*
* # Define a 'my_name' reference, associated with func_a() by default
* DEFINE_STATIC_CALL(my_name, func_a);
*
* # Call func_a()
* static_call(my_name)(arg1, arg2);
*
* # Update 'my_name' to point to func_b()
* static_call_update(my_name, &func_b);
*
* # Call func_b()
* static_call(my_name)(arg1, arg2);
*
*
* Implementation details:
*
* This requires some arch-specific code (CONFIG_HAVE_STATIC_CALL).
* Otherwise basic indirect calls are used (with function pointers).
*
* Each static_call() site calls into a trampoline associated with the name.
* The trampoline has a direct branch to the default function. Updates to a
* name will modify the trampoline's branch destination.
*
* If the arch has CONFIG_HAVE_STATIC_CALL_INLINE, then the call sites
* themselves will be patched at runtime to call the functions directly,
* rather than calling through the trampoline. This requires objtool or a
* compiler plugin to detect all the static_call() sites and annotate them
* in the .static_call_sites section.
*
*
* Notes on NULL function pointers:
*
* Static_call()s support NULL functions, with many of the caveats that
* regular function pointers have.
*
* Clearly calling a NULL function pointer is 'BAD', so too for
* static_call()s (although when HAVE_STATIC_CALL it might not be immediately
* fatal). A NULL static_call can be the result of:
*
* DECLARE_STATIC_CALL_NULL(my_static_call, void (*)(int));
*
* which is equivalent to declaring a NULL function pointer with just a
* typename:
*
* void (*my_func_ptr)(int arg1) = NULL;
*
* or using static_call_update() with a NULL function. In both cases the
* HAVE_STATIC_CALL implementation will patch the trampoline with a RET
* instruction, instead of an immediate tail-call JMP. HAVE_STATIC_CALL_INLINE
* architectures can patch the trampoline call to a NOP.
*
* In all cases, any argument evaluation is unconditional. Unlike a regular
* conditional function pointer call:
*
* if (my_func_ptr)
* my_func_ptr(arg1)
*
* where the argument evaludation also depends on the pointer value.
*
* When calling a static_call that can be NULL, use:
*
* static_call_cond(name)(arg1);
*
* which will include the required value tests to avoid NULL-pointer
* dereferences.
*
* To query which function is currently set to be called, use:
*
* func = static_call_query(name);
*
*
* DEFINE_STATIC_CALL_RET0 / __static_call_return0:
*
* Just like how DEFINE_STATIC_CALL_NULL() / static_call_cond() optimize the
* conditional void function call, DEFINE_STATIC_CALL_RET0 /
* __static_call_return0 optimize the do nothing return 0 function.
*
* This feature is strictly UB per the C standard (since it casts a function
* pointer to a different signature) and relies on the architecture ABI to
* make things work. In particular it relies on Caller Stack-cleanup and the
* whole return register being clobbered for short return values. All normal
* CDECL style ABIs conform.
*
* In particular the x86_64 implementation replaces the 5 byte CALL
* instruction at the callsite with a 5 byte clear of the RAX register,
* completely eliding any function call overhead.
*
* Notably argument setup is unconditional.
*
*
* EXPORT_STATIC_CALL() vs EXPORT_STATIC_CALL_TRAMP():
*
* The difference is that the _TRAMP variant tries to only export the
* trampoline with the result that a module can use static_call{,_cond}() but
* not static_call_update().
*
*/