Re: [PATCH v6 06/15] tools/nolibc: __sysret: support syscalls who return a pointer

From: Zhangjin Wu
Date: Sun Aug 06 2023 - 08:34:22 EST


> Hi Zhangjin,
>
> On Fri, Jul 07, 2023 at 10:56:59PM +0800, Zhangjin Wu wrote:
> > No official reference states the errno range, here aligns with musl and
> > glibc and uses [-MAX_ERRNO, -1] instead of all negative ones.
> >
> > - musl: src/internal/syscall_ret.c
> > - glibc: sysdeps/unix/sysv/linux/sysdep.h
> >
> > The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc
> > defined in tools/include/nolibc/errno.h.
> >
> > Suggested-by: Willy Tarreau <w@xxxxxx>
> > Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@xxxxxx/
> > Suggested-by: David Laight <David.Laight@xxxxxxxxxx>
> > Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@xxxxxxxxxxxxxxxx/
> > Signed-off-by: Zhangjin Wu <falcon@xxxxxxxxxxx>
> > ---
> > tools/include/nolibc/sys.h | 17 ++++++++++++-----
> > 1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
> > index 53bc3ad6593e..3479f54d7957 100644
> > --- a/tools/include/nolibc/sys.h
> > +++ b/tools/include/nolibc/sys.h
> > @@ -28,13 +28,20 @@
> > #include "errno.h"
> > #include "types.h"
> >
> > -/* Syscall return helper, set errno as -ret when ret < 0 */
> > +
> > +/* Syscall return helper for library routines, set errno as -ret when ret is in
> > + * range of [-MAX_ERRNO, -1]
> > + *
> > + * Note, No official reference states the errno range here aligns with musl
> > + * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
> > + */
> > +
> > static __inline__ __attribute__((unused, always_inline))
> > -long __sysret(long ret)
> > +long __sysret(unsigned long ret)
> > {
> > - if (ret < 0) {
> > - SET_ERRNO(-ret);
> > - ret = -1;
> > + if (ret >= (unsigned long)-MAX_ERRNO) {
> > + SET_ERRNO(-(long)ret);
> > + return -1;
> > }
> > return ret;
> > }
>
> While running some regression tests, I found that my programs increased
> in size by 3-4% overall, which was solely attributed to this helper that
> significantly increased the size of many syscalls (particularly those
> returning ints). Let's consider this simple function:
>

I'm very sad to learn this, so sorry, but we do need a test suite for the
coverage on different toolchains, on O0/1/2/3/s, on generated size ...

> void unlink_exit(const char *name)
> {
> int ret = unlink(name);
> exit(ret < 0 ? errno : 0);
> }
>
> Before:
> $ nm --size unlink.o | grep unli
> 0000000000000030 T unlink_exit
>
> $ objdump -d -j .text --disassemble=unlink_exit unlink.o
> 000000000000003b <unlink_exit>:
> 3b: 48 89 fe mov %rdi,%rsi
> 3e: b8 07 01 00 00 mov $0x107,%eax
> 43: 31 d2 xor %edx,%edx
> 45: 48 c7 c7 9c ff ff ff mov $0xffffffffffffff9c,%rdi
> 4c: 0f 05 syscall
> 4e: 31 ff xor %edi,%edi
> 50: 85 c0 test %eax,%eax
> 52: 79 0a jns 5e <unlink_exit+0x23>
> 54: 89 c7 mov %eax,%edi
> 56: f7 df neg %edi
> 58: 89 3d 00 00 00 00 mov %edi,0x0(%rip) # 5e <unlink_exit+0x23>
> 5e: b8 3c 00 00 00 mov $0x3c,%eax
> 63: 40 0f b6 ff movzbl %dil,%edi
> 67: 0f 05 syscall
> 69: eb fe jmp 69 <unlink_exit+0x2e>
>
> After:
> $ nm --size unlink.o | grep unli
> 0000000000000042 T unlink_exit
>
> $ objdump -d -j .text --disassemble=unlink_exit unlink.o
> 0000000000000051 <unlink_exit>:
> 51: 48 89 fe mov %rdi,%rsi
> 54: b8 07 01 00 00 mov $0x107,%eax
> 59: 31 d2 xor %edx,%edx
> 5b: 48 c7 c7 9c ff ff ff mov $0xffffffffffffff9c,%rdi
> 62: 0f 05 syscall
> 64: 48 63 d0 movslq %eax,%rdx
> 67: 48 81 fa 00 f0 ff ff cmp $0xfffffffffffff000,%rdx
> 6e: 76 0a jbe 7a <unlink_exit+0x29>
> 70: f7 da neg %edx
> 72: 89 15 00 00 00 00 mov %edx,0x0(%rip) # 78 <unlink_exit+0x27>
> 78: eb 06 jmp 80 <unlink_exit+0x2f>
> 7a: 31 ff xor %edi,%edi
> 7c: 85 c0 test %eax,%eax
> 7e: 79 06 jns 86 <unlink_exit+0x35>
> 80: 8b 3d 00 00 00 00 mov 0x0(%rip),%edi # 86 <unlink_exit+0x35>
> 86: b8 3c 00 00 00 mov $0x3c,%eax
> 8b: 40 0f b6 ff movzbl %dil,%edi
> 8f: 0f 05 syscall
> 91: eb fe jmp 91 <unlink_exit+0x40>
>
> => that's 18 bytes added to retrieve the result of a syscall.
>
> There are several reasons involved:
> - the "unsigned long" argument to __sysret() forces a sign extension
> from all sys_* functions that used to return int (the movslq above).
>
> - the comparison with the error range now has to be performed on a
> long instead of an int
>
> - the return value from __sysret() is a long (note, a signed long)
> which then has to be turned back to an int before being returned
> by the caller to satisfy the caller's prototype.
>
> I could recover a part of it by replacing the __sysret() function with
> a macro that preserves the input type and avoids these useless
> conversions:
>
> #define __sysret(arg) ({ \
> typeof(arg) __sysret_arg = (arg); \
> if ((unsigned long)__sysret_arg >= (unsigned long)-MAX_ERRNO) { \
> SET_ERRNO(-(int)__sysret_arg); \
> __sysret_arg = -1L; \
> } \
> __sysret_arg; \
> })
>
> But the remaining part is the comparison to -MAX_ERRNO inflicted on all
> integer returns where we could previously keep a simple sign comparison:
>
> 51: 48 89 fe mov %rdi,%rsi
> 54: b8 07 01 00 00 mov $0x107,%eax
> 59: 31 d2 xor %edx,%edx
> 5b: 48 c7 c7 9c ff ff ff mov $0xffffffffffffff9c,%rdi
> 62: 0f 05 syscall
> 64: 3d 00 f0 ff ff cmp $0xfffff000,%eax
> 69: 76 0a jbe 75 <unlink_exit+0x24>
> 6b: f7 d8 neg %eax
> 6d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 73 <unlink_exit+0x22>
> 73: eb 06 jmp 7b <unlink_exit+0x2a>
> 75: 31 ff xor %edi,%edi
> 77: 85 c0 test %eax,%eax
> 79: 79 06 jns 81 <unlink_exit+0x30>
> 7b: 8b 3d 00 00 00 00 mov 0x0(%rip),%edi # 81 <unlink_exit+0x30>
> 81: b8 3c 00 00 00 mov $0x3c,%eax
> 86: 40 0f b6 ff movzbl %dil,%edi
> 8a: 0f 05 syscall
> 8c: eb fe jmp 8c <unlink_exit+0x3b>
>
> And given that the vast majority of the syscalls return integers, I think
> we should specialize these sysret functions so that we don't add needless
> complexity for all those for which we know they're returning ints (it's
> written in the caller's prototype anyway). I.e. we can have __sysret_int()
> that is the low-overhead version and keep __sysret() the expensive one
> doing the comparison (and possibly through the macro like above if needed
> in order to avoid multiple casts).
>

Based on your macro version, I tried to use the is_signed_type() from kernel,
it seems works.

A simple test shows, before:

// ppc64
$ size nolibc-test
text data bss dec hex filename
27308 1880 80 29268 7254 nolibc-test

// mips
$ size nolibc-test
text data bss dec hex filename
23276 64 64 23404 5b6c nolibc-test

After:

// ppc64
$ size nolibc-test
text data bss dec hex filename
27136 1880 80 29096 71a8 nolibc-test

// mips
$ size nolibc-test
text data bss dec hex filename
23036 64 64 23164 5a7c nolibc-test

> I'm not going to change all that now, that's too late, but I'm a bit sad
> to see my binaries inflate just because of this, so I hope we can get back
> to this later after the current queue is flushed.
>

Ok, will send a patch with is_signed_type() for more discuss soon.

Thanks,
Willy

> Regards,
> Willy