Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

From: Nikolay Borisov
Date: Thu Jan 28 2021 - 02:58:24 EST




On 28.01.21 г. 5:38 ч., Masami Hiramatsu wrote:
> Hi,
>

<snip>

>
> Yeah, there is. Nikolay, could you try this tentative patch?
I can confirm that with this patch everything is working. I also applied
the following diff:

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6c0018abe68a..cc5a3a18816d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -96,8 +96,10 @@ unsigned int trace_call_bpf(struct trace_event_call
*call, void *ctx)
{
unsigned int ret;

- if (in_nmi()) /* not supported yet */
+ if (in_nmi()) /* not supported yet */ {
+ trace_dump_stack(0);
return 1;
+ }

cant_sleep();



And can confirm that the branch is being hit and the following call
trace is produced:

=> __ftrace_trace_stack
=> trace_call_bpf
=> kprobe_perf_func
=> kprobe_int3_handler
=> exc_int3
=> asm_exc_int3
=> btrfs_sync_file
=> do_fsync
=> __x64_sys_fsync
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe


>
> Of course this just drops the NMI check from the handler, so alternative
> checker is required. But I'm not sure what the original code concerns.
> As far as I can see, there seems no re-entrant block flag, nor locks
> among ebpf programs in runtime.
>
> Alexei, could you tell me what is the concerning situation for bpf?
>
> Thank you,
>
> From c5cd0e5f60ef6494c9e1579ec1b82b7344c41f9a Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> Date: Thu, 28 Jan 2021 12:31:02 +0900
> Subject: [PATCH] tracing: bpf: Remove in_nmi() check from kprobe handler
>
> Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") has
> changed the kprobe handler to run in the NMI context, in_nmi() always returns
> true. This means the bpf events on kprobes always skipped.

FWIW I'd prefer if in addition to the original commit you also mention:

ba1f2b2eaa2a ("x86/entry: Fix NMI vs IRQ state tracking")
b6be002bcd1d ("x86/entry: Move nmi entry/exit into common code")

Since they changed the way nmi state is managed in exc_int3 and not in
the original do_int3. THe latter no longer contains any references to
nmi-related code.

>
> Signed-off-by: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
> ---
> kernel/trace/bpf_trace.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 6c0018abe68a..764400260eb6 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -96,9 +96,6 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
> {
> unsigned int ret;
>
> - if (in_nmi()) /* not supported yet */
> - return 1;
> -
> cant_sleep();
>
> if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
>