Re: [RFC PATCH] set TASK_TRACED before arch_ptrace code to fix a race

From: Roland McGrath
Date: Tue May 27 2008 - 00:05:44 EST


> > if happens, it should be a bug, right?

It doesn't even make sense that it should be possible.
So if it somehow is possible, that is certainly a bug.
But the mind boggles as to exactly what sort of bug it could be.

> It does happen!!

Um. Really? What does happen exactly?

> Call Trace:
> [<a000000100011bd0>] show_stack+0x50/0xa0
> sp=e000000146bbfbb0 bsp=e000000146bb0e08
> [<a000000100011c50>] dump_stack+0x30/0x60
> sp=e000000146bbfd80 bsp=e000000146bb0de8
> [<a0000001000979a0>] get_signal_to_deliver+0x60/0x6e0
> sp=e000000146bbfd80 bsp=e000000146bb0d80
> [<a0000001000343d0>] ia64_do_signal+0xb0/0xd00
> sp=e000000146bbfd80 bsp=e000000146bb0cd8
> [<a000000100012650>] do_notify_resume_user+0xf0/0x140
> sp=e000000146bbfe20 bsp=e000000146bb0ca8
> [<a00000010000aac0>] notify_resume_user+0x40/0x60
> sp=e000000146bbfe20 bsp=e000000146bb0c58
> [<a00000010000a9f0>] skip_rbs_switch+0xe0/0x110
> sp=e000000146bbfe30 bsp=e000000146bb0c58
> [<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
> sp=e000000146bc0000 bsp=e000000146bb0c58

So this here shows a perfectly normal trace that bottoms out at a syscall
entry from user mode. You seem to be saying that, somehow, inside
ptrace_stop(), we tried to return to user mode--I guess you mean losing the
kernel stack with the call chain leading to ptrace_stop()--and then
reentered the kernel as for a signal after a syscall.

> I applied the following patch , and got the call trace above..
> If apply my RFC patch as antidote, I don't see "deliver" ...

With just that diagnostic patch as shown, these might be two different
threads. But I guess you've ruled that out somehow? If this does in fact
happen in the thread that is supposed to be in ptrace_stop(), then the
trail we need to follow is in arch_ptrace_stop(), i.e. ia64_ptrace_stop().

> Is the problem clear now?

I'm sorry, it's not at all clear to me.

> I will serve you until every thing is clear to you.

That's quite a commitment! My full enlightenment may be a long time off.
I won't hold you to it once we've fixed this particular bug, though. ;-)

What should be happening is that ia64_ptrace_stop() should do its work,
possibly blocking, and then return to its caller in ptrace_stop(). At no
point should it be possible for ia64_ptrace_stop() to return directly to
user mode, or to reenter notify_resume_user() in any fashion.

Please focus on the exact code path taken inside the ia64_ptrace_stop()
call. It should be possible to identify every step of that and see exactly
where it goes astray from what we expect.


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/