Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

From: Josh Poimboeuf
Date: Thu Jan 28 2021 - 11:52:07 EST


On Thu, Jan 28, 2021 at 06:45:56PM +0200, Nikolay Borisov wrote:
> On 28.01.21 г. 18:12 ч., Nikolay Borisov wrote:
> > On 28.01.21 г. 5:38 ч., Masami Hiramatsu wrote:
> >> Hi,
> >
> > <snip>
> >>
> >> Alexei, could you tell me what is the concerning situation for bpf?
> >
> > Another data point masami is that this affects bpf kprobes which are
> > entered via int3, alternatively if the kprobe is entered via
> > kprobe_ftrace_handler it works as expected. I haven't been able to
> > determine why a particular bpf probe won't use ftrace's infrastructure
> > if it's put at the beginning of the function. An alternative call chain
> > is :
> >
> > => __ftrace_trace_stack
> > => trace_call_bpf
> > => kprobe_perf_func
> > => kprobe_ftrace_handler
> > => 0xffffffffc095d0c8
> > => btrfs_validate_metadata_buffer
> > => end_bio_extent_readpage
> > => end_workqueue_fn
> > => btrfs_work_helper
> > => process_one_work
> > => worker_thread
> > => kthread
> > => ret_from_fork
> >
> >>
>
> I have a working theory why I'm seeing this. My kernel (broken) was
> compiled with retpolines off and with the gcc that comes with ubuntu
> (both 9 and 10:
> gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
> gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
> )
>
> this results in CFI being enabled so functions look like:
> 0xffffffff81493890 <+0>: endbr64
> 0xffffffff81493894 <+4>: callq 0xffffffff8104d820 <__fentry__>
>
> i.e fentry's thunk is not the first instruction on the function hence
> it's not going through the optimized ftrace handler. Instead it's using
> int3 which is broken as ascertained.
>
> After testing with my testcase I confirm that with cfi off and
> __fentry__ being the first entry bpf starts working. And indeed, even
> with CFI turned on if I use a probe like :
>
> bpftrace -e 'kprobe:btrfs_sync_file+4 {printf("kprobe: %s\n",
> kstack());}' &>bpf-output &
>
>
> it would be placed on the __fentry__ (and not endbr64) hence it works.
> So perhaps a workaround outside of bpf could essentially detect this
> scenario and adjust the probe to be on the __fentry__ and not preceding
> instruction if it's detected to be endbr64 ?

For now (and the foreseeable future), CET isn't enabled in the kernel.
So that endbr64 shouldn't be there in the first place. I can make a
proper patch in a bit.

diff --git a/Makefile b/Makefile
index e0af7a4a5598..5ccc4cdf1fb5 100644
--- a/Makefile
+++ b/Makefile
@@ -948,11 +948,8 @@ KBUILD_CFLAGS += $(call cc-option,-Werror=designated-init)
# change __FILE__ to the relative path from the srctree
KBUILD_CPPFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)

-# ensure -fcf-protection is disabled when using retpoline as it is
-# incompatible with -mindirect-branch=thunk-extern
-ifdef CONFIG_RETPOLINE
+# Intel CET isn't enabled in the kernel
KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
-endif

# include additional Makefiles when needed
include-y := scripts/Makefile.extrawarn