Re: [PATCH 1/1] ptrace: make sure do_wait() won't hang afterPTRACE_ATTACH

From: Oleg Nesterov
Date: Fri Feb 04 2011 - 08:13:25 EST


On 02/04, Tejun Heo wrote:
>
> Hello, Oleg, Roland.
>
> On Thu, Feb 03, 2011 at 10:44:50PM +0100, Oleg Nesterov wrote:
> > On 02/03, Roland McGrath wrote:
> > >
> > > IMHO this sort of band-aid does not really help the overall situation.
> > > It takes something that is intricate and fiddly and just fiddles it a
> > > bit more. Userland will still have to handle older kernels where this
> > > behavior is not there. If userland does anything that relies on this
> > > new behavior, then it will have to try somehow to figure out which
> > > kernel versions have which behavior and adapt, etc.
> >
> > Absolutely agreed.
> >
> > As I said, I am not sure this patch makes sense. I only sent it
> > because I have to react to the bug report.
> >
> > > When the old behaviors are unhelpful like this, I think it is really
> > > better to add new mechanisms instead.
> >
> > Agreed!
> >
> > Can't resist, let me repeat... imho ptrace is unfixable ;)
>
> Hmm... I can't reproduce the problem here,

Very strange. Do you mean the test-case doesn't die? (on vanilla kernel).

> but isn't the problematic
> part here the mixing of ptrace and group stop and sliently
> transforming group stop into ptrace

Not exactly,

> and ptracer consuming the usual
> exit code instead of the ptrace specific one?

Well, unless the task dies nobody except ptrace can use ->exit_code.

The problem is:

- the task T stops, it sets ->exit_code exactly because
the tracer can attach after that

- the tracer attaches, does wait(), consumes exit_code
and exits

- another tracer attaches, but exit_code == 0

There is no STOPPED/TRACED transformation at all.

> Also, I don't agree with the notion that doing something entirely new
> would magically solve all the problems. Improvements are achieved
> through evolution. For ptrace, the situation definitely is aggravated
> by the use of wait

... and reparenting, and signals.

> and weird interaction with group stop,

Yes. And to me the main problem is not the current behaviour. The
problem is that we never tried to define the correct behavior.
OK, real_parent can miss the notification. We can fix this, but
for what? The tracer can resume the thread "silently", this doesn't
look very good anyway.

But even this doesn't matter. We can not change ptrace API so that,
say, it does not reparent the tracee. Once we do this, we already
have the new API.

So, personally I think we need the new API. And we already have
utrace which allows to implement "anything" on top of it, including
the old ptrace for compatibility.

> Actually, I think such approach is significantly harmful to
> improvements of the existing code base. Instead of encouraging
> investigating the actual problems and making sensible tradeoffs, such
> approach discourages making reasonable tradeoffs with the false
> expectation that something in the future will magically solve the
> problems. In a lot of cases, there are no unicorns. Proceeding
> forward while managing damage at reasonable level is usually the right
> way to go.

Well, perhaps I am wrong, this is only my opinion.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/