Re: uprobes && pre-filtering

From: Oleg Nesterov
Date: Tue Nov 06 2012 - 12:02:28 EST


On 11/06, Srikar Dronamraju wrote:
>
> > Hello.
> >
> > There is a known (and by design) problem with uprobes. They act
> > systemwide, there is no pre-filtering. Just some random thoughts
> > to provoke the discussion.
> >
> > - I think that the current uprobe_consumer->filter(task) should die.
> >
> > It buys nothing. It is called right before ->handler() and this is
> > pointless, ->handler() can call it itself.
> >
>
> I think it helps to call filter before handler.

How? The hanlder can simply call this function at the start.

But this is minor. I think this can double the work sometimes. If ->handler
needs to do something nontrivial, it will probably look into my_traced_tasks
database anyway to find the additional info which represents the tracee.
And most probably ->filter() will do the same lookup unnecessary.

> When a tracer has specified a filter condition and then we run a handler
> for cases that dont fit the handler looks a little odd.
>
> Another reason for having the filters in the current way was to have a
> set of standard filters in uprobes code so that all users dont need to
> recreate these filters.

IOW, you mean that both register_for_each_vma() and handler_chain() should
use the same ->filter() method? Personally I do not think they should.

Because the semantics is different imo. register_for_each_vma() needs
to know if we should modify mm and insert the breakpoint. handler_chain()
just tries to skip ->handler() (and for no reason, imho).

> > And more importantly, I do not think this hook (with the same
> > semantics) can/should be used by both register and handler_chain().
> >
> > - We should change register_ and and mmap to filter out the tasks
> > which we do not want to trace. To simplify, lets forget about
> > multiple consumers first.
> >
> > Everything is simple, install_breakpoint() callers should simply
> > call consumer->filter(args) and do nothing if it returns false.
> >
> > The main problem is, what should be passed as "args". I think it is
> > pointless to use task_struct as an argument. And not because there
> > is no simple way to find all tasks which use this particular mm in
> > register_for_each_vma(), even if it was possible I think this makes
> > no sense.
> >
>
> Having mm instead of task is fine with me. But this would mostly mean
> that the prefilter, i.e filter at the time of registration and the
> filter at the time of handling should be different

Yes, I think they should be different, please see above.

We need the pre-filtering to avoid the unnecessary traps, not to avoid
the function call when the task already hit the breakpoint.

> or we remove the
> handling time filtering and expect the handler to do the filtering.

Yes. But once again, if ->handler() wants to use the same function it
can simply call it.

> Having a task instead of mm helps in having the same filter run at both
> places.

And this is where we start to disagree. Namely, I do not think that
register_for_each_vma() should even try to find mm user(s).

Because once again, a) whater register_for_each_vma() can do to iterate
the tasks can be done by ->filter, b) right now I do not see any potential
user who will need this so we should not overdesign this.

> > If 2 tasks/threads share the same mm they will share int3 as well,
> > so I think we need
> >
> > enum filter_mode {
> > UPROBE_FILTER_REGISTER,
> > UPROBE_FILTER_MMAP,
> > /* more */
> > };
> >
> > consumer->filter(enum filter_mode mode, struct mm_struct *mm);
>
> what would a tracing session having filter_mode == UPROBE_FILTER_REGISTER mean?
> Does that mean it can trace only instances in currently mapped vmas?
> Would it handle a probe instance in a process that is already running but has not yet mapped a vma?
>
> Also would a tracing session having filter_mode == UPROBE_FILTER_MMAP
> mean that it would only handle probepoints in only vmas that are going
> to be mapped in future?

No, no, you misunderstood. Sorry I was unclear and yes, the naming I used
was confusing.

I meant that both register_for_each_vma() and uprobe_mmap() (lets ignore
other callers and other modes) call the same ->filter() method but with
the different "mode" argument, UPROBE_FILTER_REGISTER or UPROBE_FILTER_MMAP.
This is only the hint, for example UPROBE_FILTER_MMAP can use current
but UPROBE_FILTER_REGISTER obviously can't.

> > - Perhaps we should extend the API. We can add
> >
> > uprobe_apply(consumer, task, bool add_remove);
> >
> > which adds/removes breakpoints to task->mm.
> >
> > This way consumer can probe every task it wants to trace after
> > uprobe_register().
> >
> > Its ->filter(UPROBE_FILTER_REGISTER) should simply return false. Or,
> > better, we can split uprobe_register() into 2 functions,
> > __uprobe_register() and uprobe_apply_all() which actually does
> > register_for_each_vma().
> >
> > ***** QUESTION *****: perhaps this is all systemtap needs? ignoring
> > UPROBE_FILTER_MMAP.
> >
> > - Multiple consumers. uprobe_register/uprobe_unregister should be
> > modified so that register_for_each_vma() is called every time when
> > the new consumer comes or consumer goes away.
> >
> > uprobe_apply(add_remove => false) should obviously consult other
> > consumers too.
>
>
> So in this case, would uprobe_register() just add a consumer to a
> new/existing uprobe. The actual probe insertion is done by the
> uprobe_apply()/uprobe_apply_all().

Yes. Not sure we really need this, but to me this extension looks natural.

Frank, Josh, do you think it can help systemtap ?


Suppose that the consumer no longer wants to trace the task T but it
has other tasks to trace. If we only rely on ->filter, the tracer should
do unregister + register, this doesn't look good. Or it wants to add
the new task to its trace-list...

> Also in this case, there is no
> filter element in the uprobe_consumer. Right?

Why? it could be there.

> I am just wondering if people could use/misuse this
> for example: - somebody could keep modifying and reusing the consumers
> but one probe registration, with subsequent uprobe_apply().

Yes, exactly.

> Unlike now we could have lots of uprobes but all with defunct consumers.

sorry, can't understand...

> > - Perhaps we should teach handle_swbp() to remove the unwanted breakpoints.
> >
> > If every ->handler() returns, say, UPROBE_GO_AWAY handle_swbp() should
> > remove this breakpoint. Of course, a consumer must be sure that if it
> > returns UPROBE_GO_AWAY this task can't share ->mm with another task it
> > wants to trace.
>
> Its a good idea to remove redundant probepoints from the breakpoint hit
> context. However,
>
> - even from the handlers() perspective, if we have to identify that this
> consumer isnt looking at this mm, we need something like
> for_each_mm_user(). No?

Yes and no, I think.

Yes, _in general_ it is needed. (although once again, where is potential
users?).

But in the simple case - no. For example, filter-by-pid should return
UPROBE_GO_AWAY if current->mm != find_task_by_vpid(PID)->mm.

> - also if we are looking for removing a breakpoint, I would think its
> better done in common code, i.e uprobe code rather than keeping it in
> every handler. So I think keeping the filter logic before the handler
> is good.

Again, I do not understand why do you think so. OK, I won't really insist,
we can add UBPROBE_FILTER_BP_HIT or whatever. Still I do not understand
why should we mix remove-this-breakpoint and do-not-call-hanlder. And
note that ->handler() has to do addtional checks anyway.

Anyway. Do you agree that filter's arguments should be changed? If yes,
then we should remove handler_chain()->filter(), then discuss why we
should add it back and which semantics it should have. Agreed?

> > Or consumer->handler() can do uprobe_apply(add_remove => false) itself,
> > but this needs more discussion.
> >
> > The point is that if the filtering at UPROBE_FILTER_REGISTER time is
> > not possible, ->filter(UPROBE_FILTER_REGISTER) can return true. A
> > "wrong" int3 doesn't hurt until the task actually hits the breakpoint,
> > and I think that a single bp-hit is fine performance-wise.
> >
>
> Agree. Also the reason why we inserted this probe in the current mm
> might have changed and we might no more need the probe.
> For example, traced thread in a multithreaded process actually died and
> all other threads may not be interested

Yes.

> > - fork(). The child inherits all breakpoints from parent, and uprobes
> > can't control this. What can we do?
> >
> > * We can add another uprobe hook which does something like
> >
> > for_each_uprobe_in_each_vma(child_mm) {
> > if (filter(UPROBE_FILTER_FORK))
> > install_breakoint();
> > else
> > remove_breakpoint();
> > }
> >
> >
> > But is is not clear where can we add this hook. dup_mmap()
> > looks appealing, but at this time the child is still under
> > construction, consumer->filter() can't look at task_struct.
> >
> > And of course, it is not nice to slow down fork().
> >
> > * If we only care about the unwanted breakpoints, perhaps it
> > would be better to rely on UPROBE_GO_AWAY above?
> >
> > * Finally, do we care at all? Again, who can ever need to
> > re-install breakpoints after fork?
> >
> > systemtap (iiuc) doesn't need this. And even if it does
> > or will need, I guess it can hook fork itself and use
> > uprobe_apply() ?
>
>
> I was thinking if we can implement for_each_mm_user() using additional
> mm->flags.

(I guess you didn't meant it can help to solve the problem with fork)

I do not know if we can (or should) implement for_each_mm_user().

My main point was, the core uprobes should not care. Once again,
who do you think will need it? systemtap doesn't afaics, neither
debug/tracing/uprobe_events.

And once again, even if we have to implement it for some new tricky
tracer, why its ->filter can not do for_each_mm_user() itself?

> Would something like this work?
> ...
>
> - for_each_mm_users would mostly boil down to searching in children,
> siblings unless MMF_REPARENTED flag is set.

Whose children? I don't think mm->owner is always the root of the
tree. And what about the locking? I do not think we want to call
->filter under (say) tasklist.



But again, again. Why do you think register_for_each_vma() should do
this? IOW, why should we think about for_each_mm_user() at all (at
least right now) ?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/