Re: Compat 32-bit syscall entry from 64-bit task!?
From: Indan Zupancic
Date: Mon Feb 06 2012 - 20:53:01 EST
On Mon, February 6, 2012 18:02, H. Peter Anvin wrote:
> On 02/06/2012 12:32 AM, Indan Zupancic wrote:
>> It seems that just using eflags is a lot simpler than the alternatives,
>> let's just go for it.
>> I propose using bits somewhere in the middle of the upper half. If new
>> flags are ever added by Intel or AMD, they will use the lower bits. If
>> anyone else ever adds flags, they most likely add them to the top (VIA).
>> So the middle seems the safest spot as far as long-term maintenance goes.
>> The below version does that, but instead of setting one of the two bits,
>> it always sets bit 50 for newer kernels and sets bit 51 if it's a compat
>> system call. I find this version more readable and after compilation it's
>> also a couple of bytes smaller compared to Linus' original version.
>> Should we make sure that the top 32 bits are zero, in case any weird
>> hardware does set our bits?
> [Adding H.J. Lu, since he has run into some of these requirements before]
> NAK in the extreme.
> We have not heard back from the architecture people on this, and I will
> NAK this unless that happens.
> Furthermore, you're picking bits that do not work for 32 bits, EVEN
> THOUGH WE HAVE A SIMILAR PROBLEM ON 32 BITS; I outlined it for you and
> you chose to ignore it.
Sorry, I missed that. I looked up that email and you indeed did, though
you didn't give any details about what the problems are.
> Finally, I think we actually are going to need a fair number of bits in
> the end. All of this points to using a new regset designed for
> extension in the first place.
> As far as I can tell, we need at least the following information:
> - If the CPU is currently in 32- or 64-bit mode.
What is the best way to find that out at the kernel side? Add a function
that checks cs and returns the correct answer? But in the kernel path the
CPU is always in 64-bit mode, so I suppose you want to know what mode the
tracee was in?
> - If we are currently inside a system call, and if so if it was entered
> - SYSCALL64
> - INT 80
> - SYSCALL32
> - SYSENTER
> The reason we need this information is because for the various 32-bit
> entry points we do some very ugly swizzling of registers, which
> matters to a ptrace client which wants to modify system call
But isn't the swizzling done in such way that all this is hidden from
ptrace clients (and the rest of the kernel)? Why would a ptrace client
need to know the details of the 32-bit entry call?
The ptrace client can always modify the same registers, as system calls
always use the same registers too. No unexpected behaviour happens as
far as I can tell from looking at the code, at least not in the syscall
E.g. ENTRY(ia32_cstar_target) in ia32entry.S does:
movq %rbp,RCX-ARGOFFSET(%rsp) /* this lies slightly to ptrace */
To hide that for SYSCALL32 arg2 comes in edp instead of rcx. Same for arg6.
(I actually can't find a SYSCALL32 entry in entry_32.S, am I blind or
was it too slow until the 64-bit Athlons showed up?)
A pure 32-bit kernel is compiled with:
#define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
So all arguments are passed on the stack and those arguments can be
directly modified by ptrace. For compat kernels the arguments are
reloaded after ptrace and before the actual system call is done.
> - If the process was started as a 64-bit process, i386 process or x32
Can't that be figured out by looking at the AUXV data? Either via /proc
or PTRACE_GETREGSET + NT_AUXV. And as this can't change, there is no
need to pass it on all the time.
> This adds up to a minimum of six bits already (and at least two bits on
> i386), and that's just a start.
I'm not convinced that there is any real problem, it seems only one extra
bit for the task CPU mode would be needed, so three bits in total.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/