Re: Compat 32-bit syscall entry from 64-bit task!?

From: Indan Zupancic
Date: Thu Jan 26 2012 - 01:34:57 EST


On Thu, January 26, 2012 02:08, Jamie Lokier wrote:
> Is it disambiguated by PTRACE_EVENT_EXEC happening before the execve
> returns, and you knowing the TID always changes to the PID? I haven't
> yet checked which TID gets the PTRACE_EVENT_EXEC event, but if it's
> not the old one, perhaps that could be changed.

Please don't ever change the behaviour of PTRACE_EVENT_EXEC, it's
barely documented already, but if if ever changes it will be also
unreliable.

It's still unclear if the PTRACE_EVENT_EXEC comes before or after
or instead of the post-execve ptrace event. I guess before, but
can I count on that? If it is after then I get a stray weird
execve event that messes up the system call cadence.

> It would be good to improve the threaded execve() behaviour for all
> the disappearing TIDs to issue a disappearing event, and the winning
> execve changing-TID to issue an I-am-changing-TID even, anyway.

As Denys said, you get the event with the new PID, and apparently with
the latest kernel you can get the old TID with PTRACE_GETEVENTMSG.

So all the info is there to handle it statelessly now.

My point was that stateless handling is much preferred to stateful
handling, and hence why not having the syscall mode available for
the syscall exit event would be inconvenient sometimes (meaning the
real mode can be different than guessed).

>> > Thus, minimally we need one new option, PTRACE_O_TRACE_SYSENTRY -
>> > "on syscall entry ptrace stop, set a nonzero event value in wait status"
>> > , and two event values: PTRACE_EVENT_SYSCALL_ENTRY (for native entry),
>> > PTRACE_EVENT_SYSCALL_ENTRY1 for compat one.
>>
>> Not all code wants to receive a syscall exit event all the time, so
>> if you add PTRACE_O_TRACE_SYSENTRY, please add PTRACE_O_TRACE_SYSEXIT
>> too. That would pretty much halve ptrace's overhead for my use case.
>> But this is orthogonal to the compat problem.
>
> I agree. I would like to ignore the exit for most syscalls but see a
> few of them. I guess PTRACE_SETOPTIONS could be used to toggle it,
> with some overhead.

Yes, that's what I had in mind.

> But in the spirit of this thread,
> PTRACE_O_TRACE_BPF would be even better, to completely ignore
> irrelevant syscalls :-)

Yes, that's the only reason I'm interested in BPF, really.
Most system calls are either always allowed, or always denied.
Of the ones that need checking, most of them have file paths.
For those I'm not interested in the post-syscall event.

>> > To future-proof this scheme we may reserve a few more event values
>> > PTRACE_EVENT_SYSCALL_ENTRY2, PTRACE_EVENT_SYSCALL_ENTRY3, etc,
>> > if we'll ever have arches with more than one non-native syscall
>> > entry. I'm no expert, but looking at strace code, ARM may already have
>> > more than one additional convention how to pass syscall args.
>>
>> Please, no! This way lays madness, just one PTRACE_EVENT_SYSCALL_ENTRY,
>> no PTRACE_EVENT_SYSCALL_ENTRY1 or PTRACE_EVENT_SYSCALL_ENTRY2, that
>> would be horrible. Keep arch specific stuff in arch specific areas,
>> please don't spread it around.
>>
>> What was wrong with using eflags again? Is it too simple or something?
>
> Well it doesn't deal with the equivalent issue on ARM and PA-RISC.

Those issues are not equivalent. ARM only has that OABI thing which
is hopefully not used in practice. Can you switch modes on-the-fly in
PA-RISC without doing a system call? Both ARM and PA-RISC use only one
struct pt_regs and one syscall table.

Greetings,

Indan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/