Re: Compat 32-bit syscall entry from 64-bit task!?

From: Jamie Lokier
Date: Wed Jan 25 2012 - 20:09:44 EST


Indan Zupancic wrote:
> On Thu, January 26, 2012 00:32, Denys Vlasenko wrote:
> > On Wednesday 25 January 2012 20:36, Oleg Nesterov wrote:
> >> IOW. Currently ptrace_report_syscall() does
> >>
> >> ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
> >>
> >> We can add the new events,
> >>
> >> PTRACE_EVENT_SYSCALL_ENTRY
> >> PTRACE_EVENT_SYSCALL_COMPAT_ENTRY
> >> PTRACE_EVENT_SYSCALL_EXIT
> >> PTRACE_EVENT_SYSCALL_COMPAT_EXIT
> >
> > We can get away with just the first one.
> > (1) It's unlikely people would want to get native sysentry events but not compat ones,
> > thus first two options can be combined into one;
>
> True.
>
> > (2) syscall exit compat-ness is known from entry type - no need to indicate it; and
> > (3) if we would flag syscall entry with an event value in wait status, then syscall
> > exit will be already distinquisable.
>
> False for execve which messes everything up by changing TID sometimes.

Is it disambiguated by PTRACE_EVENT_EXEC happening before the execve
returns, and you knowing the TID always changes to the PID? I haven't
yet checked which TID gets the PTRACE_EVENT_EXEC event, but if it's
not the old one, perhaps that could be changed.

It would be good to improve the threaded execve() behaviour for all
the disappearing TIDs to issue a disappearing event, and the winning
execve changing-TID to issue an I-am-changing-TID even, anyway.

> > Thus, minimally we need one new option, PTRACE_O_TRACE_SYSENTRY -
> > "on syscall entry ptrace stop, set a nonzero event value in wait status"
> > , and two event values: PTRACE_EVENT_SYSCALL_ENTRY (for native entry),
> > PTRACE_EVENT_SYSCALL_ENTRY1 for compat one.
>
> Not all code wants to receive a syscall exit event all the time, so
> if you add PTRACE_O_TRACE_SYSENTRY, please add PTRACE_O_TRACE_SYSEXIT
> too. That would pretty much halve ptrace's overhead for my use case.
> But this is orthogonal to the compat problem.

I agree. I would like to ignore the exit for most syscalls but see a
few of them. I guess PTRACE_SETOPTIONS could be used to toggle it,
with some overhead. But in the spirit of this thread,
PTRACE_O_TRACE_BPF would be even better, to completely ignore
irrelevant syscalls :-)

> > To future-proof this scheme we may reserve a few more event values
> > PTRACE_EVENT_SYSCALL_ENTRY2, PTRACE_EVENT_SYSCALL_ENTRY3, etc,
> > if we'll ever have arches with more than one non-native syscall
> > entry. I'm no expert, but looking at strace code, ARM may already have
> > more than one additional convention how to pass syscall args.
>
> Please, no! This way lays madness, just one PTRACE_EVENT_SYSCALL_ENTRY,
> no PTRACE_EVENT_SYSCALL_ENTRY1 or PTRACE_EVENT_SYSCALL_ENTRY2, that
> would be horrible. Keep arch specific stuff in arch specific areas,
> please don't spread it around.
>
> What was wrong with using eflags again? Is it too simple or something?

Well it doesn't deal with the equivalent issue on ARM and PA-RISC.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/