Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

From: Peter Zijlstra
Date: Fri Jan 29 2021 - 11:33:55 EST


On Fri, Jan 29, 2021 at 10:59:52AM -0500, Steven Rostedt wrote:
> On Fri, 29 Jan 2021 22:40:11 +0900
> Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> > > So what, they can all happen with random locks held. Marking them as NMI
> > > enables a whole bunch of sanity checks that are entirely appropriate.
> >
> > How about introducing an idea of Asynchronous NMI (ANMI) and Synchronous
> > NMI (SNMI)? kprobes and ftrace is synchronously called and can be controlled
> > (we can expect the context) but ANMI may be caused by asynchronous
> > hardware events on any context.
> >
> > If we can distinguish those 2 NMIs on preempt count, bpf people can easily
> > avoid the inevitable situation.
>
> I don't like the name NMI IN SNMI, because they are not NMIs. They are
> actually more like kernel exceptions. Even page faults in the kernel is
> similar to a kprobe breakpoint or ftrace. It can happen anywhere, with any
> lock held. Perhaps we need a kernel exception context? Which by definition
> is synchronous.

What problem are you trying to solve? AFAICT all these contexts have the
same restrictions, why try and muck about with different names for the
same thing?