Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

From: Jiri Olsa
Date: Thu Dec 13 2018 - 06:29:32 EST


On Thu, Dec 13, 2018 at 11:01:49AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 12, 2018 at 08:26:39PM -0500, Steven Rostedt wrote:
> > On Thu, 13 Dec 2018 03:39:38 +0300
> > "Dmitry V. Levin" <ldv@xxxxxxxxxxxx> wrote:
> >
> > > btw, I didn't ask for the implementation to be ugly.
> > > You don't have to introduce polling into the kernel if you don't want to,
> > > userspace is perfectly capable of invoking wait4(2) in a loop.
> > > Just block the tracee, notify the tracer, and let it pick up the pieces.
> >
> > Note, there's been some discussion offlist to only have perf set a flag
> > when it dropped an event and have the ptrace code do the heavy lifting
> > of blocking the task and waking it back up. I think that would be a
> > cleaner solution and wont muck with perf as badly.
>
> It's still really horrid -- the question is not if we can come up with
> something, anything, to make strace work. The question is if we can
> extend something in a sane and maintainable manner to allow this.
>
> So there's a whole bunch of problems I see with all this, in no
> particular order:
>
> - we cannot block when writing to the actual buffer, and have to unroll
> the callstack and bolt on the blocking manualy in a few specific
> sites. This is ugly, inconsistent and maintenance heavy.
>
> - it only works for some 'magic' events that got the treatment, but not
> for many other you might expect it to work for with no real
> indication which and why.
>
> - the wakeups side is icky; the best I can come up with is making the
> data page R/O and single stepping on write fault, but that isn't
> multi-threading safe.
>
> Another alternative would be keeping the whole page R/O and
> using write(2) or an ioctl() to update the head pointer.
>
> Again, if we're going to do this; it needs to be done well and
> consistent and not as a special hack to enable strace-like
> functionality. And without clean and sane solutions to the above I just
> don't see it happening.
>
> Note that the first 2 points are equally true for ftrace; so I don't see
> how we could sanely add it there either.
>
>
> One, very big maybe, would be to add a new tracepoint type that includes
> a might_sleep() and we very carefully undo all the preempt_disable and
> go sleep where we should. That also gives the tracepoint crud the
> information it needs to publish the capability to userspace.

nice, I like this one.. seems like the most clean solution

jirka