Re: [PATCH][RFC] fix PTRACE_KILL

From: Eric W. Biederman
Date: Fri Aug 27 2021 - 18:06:46 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> [ Sorry, this got missed by other stuff in my inbox ]
>
> On Tue, Aug 24, 2021 at 10:12 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> The change I'd proposed makes PTRACE_KILL deliver SIGKILL regardless of
>> the target state; yours is arguably what we should've done from the very
>> beginning (what we used to have prior to 0.99pl14 and what all other
>> Unices had been doing all along), but it's a visible change wrt error
>> returns and I don't see any sane way to paper over that part.
>>
>> Linus, what would you prefer? I've no strong preferences here...
>
> Honestly, I have no huge preferences either, simply because this has
> clearly been broken so long that people who care have worked around
> the breakage already, or it just didn't matter enough.

The two choices are change PTRACE_KILL into:
"kill(SIGKILL)" or "ptrace(PTRACE_CONT, SIGKILL)".

When trying to figure out how userspace understands PTRACE_KILL
today, I have found at least one piece of userspace that
figures the definition of PTRACE_KILL is PTRACE_CONT with the signal
SIGKILL.

Frankly I find that a fair characterization, because except for the
return code when the task is not stopped PTRACE_KILL works exactly
like PTRACE_CONT with signal SIGKILL.

As there are real userspace visible differences between PTRACE_CONT and
kill(SIGKILL) I think we should go with the PTRACE_CONT definition
as that is the least likely to break userspace, and the only thing we
really care about here is making it so that malicious userspace can
not cause the kernel to misbehave.

> I don't think Eric's patch works, because if I read that one right, it
> would actually do that "wait_task_inactive()" which is very much
> against the whole point of PTRACE_KILL. We want to kill the target
> asap, and regardless of where it is stuck.

It will do wait_task_inactive in one of the cases where PTRACE_KILL is
broken today. But since the cost of a wait_task_inactive looks very
much like the cost of a spin_lock I don't see the problem. The code in
ptrace_check_attach already waits on several other spin locks.

The code in ptrace_attach verifies that the target task in in state
TASK_TRACED. Which means that the target task is in ptrace_stop
and has already done "set_special_state(TASK_TRACED)". At that point
there are two possibilities. Either the target task will have
reached freezable_schedule in ptrace_stop and wait_task_inactive won't
block at all, or wait_task_inactive will need to spin waiting for
the target task to reach freezable_schedule.

There is nothing in ptrace_stop that can sleep as all of the work
happens under a spin lock so the worst case wait in wait_task_inactive
should be quite short.


All of that said it is probably worth adding to my patch the logic so
that when ptrace_freeze_traced fails or wait_task_inactive fails the
code bails out immediately and returns 0 instead of -ESRCH. Just to
avoid any userspace behavioral differences in PTRACE_KILL. So we don't
even need to consider regressions.

Eric