Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64

From: Zhangjian (Bamvor)
Date: Fri May 13 2016 - 04:12:18 EST


Hi,

On 2016/5/12 23:28, Catalin Marinas wrote:
On Thu, May 12, 2016 at 05:24:57PM +0300, Yury Norov wrote:
On Thu, May 12, 2016 at 03:07:35PM +0100, Catalin Marinas wrote:
On Thu, May 12, 2016 at 04:44:31PM +0300, Yury Norov wrote:
On Thu, May 12, 2016 at 02:35:34PM +0100, Catalin Marinas wrote:
On Thu, May 12, 2016 at 03:20:00AM +0300, Yury Norov wrote:
I debugged preadv02 and pwritev02 failures and found very weird bug.
Test passes {iovec_base = 0xffffffff, iovec_len = 64} as one element
of vector, and kernel reports successful read/write.

There are 2 problems:
1. How kernel allows such address to be passed to fs subsystem;
2. How fs successes to read/write at non-mapped, and in fact non-user
address.

I don't know the answer on 2'nd question, and it might be something
generic. But I investigated first problem.

The problem is that compat_rw_copy_check_uvector() uses access_ok() to
validate user address, and on arm64 it ends up with checking buffer
end against current_thread_info()->addr_limit.

current_thread_info()->addr_limit for ilp32, and most probably for
aarch32 is equal to aarch64 one, and so adress_ok() doesn't fail.
It happens because on thread creation we call flush_old_exec() to set
addr_limit, and completely ignore compat mode there.

[...]

--- a/arch/arm64/kernel/binfmt_elf32.c
+++ b/arch/arm64/kernel/binfmt_elf32.c
@@ -12,6 +12,7 @@
do { \
clear_thread_flag(TIF_32BIT_AARCH64); \
set_thread_flag(TIF_32BIT); \
+ set_fs(TASK_SIZE_32); \
} while (0)

#define COMPAT_ARCH_DLINFO
diff --git a/arch/arm64/kernel/binfmt_ilp32.c b/arch/arm64/kernel/binfmt_ilp32.c
index a934fd4..a8599c6 100644
--- a/arch/arm64/kernel/binfmt_ilp32.c
+++ b/arch/arm64/kernel/binfmt_ilp32.c
@@ -59,6 +59,7 @@ static void cputime_to_compat_timeval(const cputime_t cputime,
do { \
set_thread_flag(TIF_32BIT_AARCH64); \
clear_thread_flag(TIF_32BIT); \
+ set_fs(TASK_SIZE_32); \
} while (0)

I don't think we need these two. AFAICT, flush_old_exec() takes care of
setting the USER_DS for the new thread.

That's true, but USER_DS depends on personality which is not set yet
for new thread, as I wrote above. In fact, I tried correct USER_DS
only, and it doesn't work

Ah, it looks like load_elf_binary() sets the personality after
flush_old_exec(). Looking at powerpc and x86, they set USER_DS to the
maximum 64-bit task value, so they should have a similar issue with
native 32-bit vs compat behaviour.

Hmmm. If so, it means we'd introduce generic fix. It would be removing
set_fs() from flush_old_exec() and appending it to load_elf_binary()
after SET_PERSONALITY(). But I think it should be agreed with other
arches developers.

The set_fs() in flush_old_exec() is probably fine, it may be meant to
re-set the USER_DS for the old thread.

It appears that at least powerpc and x86 don't have different USER_DS
setting for native and compat, so moving the set_fs() call further down
would not make any difference for them, nor will it fix the preadv02 LTP
test (if it fails for them, I haven't checked).

I've sent standalone patch for aarch64 (you in CC) so let's move
discussion there.

I've seen the patch but we would lose some discussion history here. I
think we should continue this thread and just summarise the conclusion
in reply to the other patch. This thread is also available on
linux-arch, in case other architecture maintainers follow it.

So what exactly is LTP complaining about? Is different error (like
EFAULT vs EINVAL) or not getting an error at all.

It should be EINVAL, but it succeed. The other problem is that
following fs routines does not complain on wrong address.

I see. The test asks the kernel to write a single byte (out of maximum
64) to the user address 0xffffffff.
What address We should set for this limitation, TASK_SIZE or STACK_TOP?
It is same for 64bit application. But STACK_TOP(0xffff0000) is below
TASK_SIZE in 32bit application. The address above STACK_TOP is preserved
for 32bit application.

Regards

Bamvor

> In the absence of the access_ok()
check, this operation succeeds. If the preadv syscall gets 2 bytes as
the count, then it would fail with EFAULT.

While it's not really a bug, I agree that for matching the native 32-bit
behavior (basically for other syscalls like those involving vfs_read()),
the simplest fix would be to have a dynamic USER_DS.