Re: [GIT PULL] x86/build changes for v4.17

From: Matthias Kaehlcke
Date: Wed Apr 04 2018 - 17:46:46 EST


El Wed, Apr 04, 2018 at 11:11:36PM +0200 Arnd Bergmann ha dit:

> On Wed, Apr 4, 2018 at 10:58 PM, Matthias Kaehlcke <mka@xxxxxxxxxxxx> wrote:
> > El Wed, Apr 04, 2018 at 10:33:19PM +0200 Arnd Bergmann ha dit:
> >>
> >> In most cases, this is used to implement a fast-path for a helper
> >> function, so not doing it the same way as gcc just results in
> >> slower execution, but I assume we also have code that behaves
> >> differently on clang compared to gcc because of this.
> >
> > I think I didn't come (knowingly) across that one yet. Could you point
> > me to an instance that could be used as an example in a bug report?
>
> This code
>
> #include <linux/math64.h>
> int f(u64 u)
> {
> return div_u64(u, 100000);
> }
>
> results in a call to __do_div64() on 32-bit arm using clang, but
> gets optimized into a set of multiply+shift on gcc.

I understand this is annoying, but it seems I'm missing something:

static inline u64 div_u64(u64 dividend, u32 divisor)
{
u32 remainder;
return div_u64_rem(dividend, divisor, &remainder);
}

static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
{
*remainder = do_div(dividend, divisor);
return dividend;
}

#define do_div(n, base) __div64_32(&(n), base)

static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
{
register unsigned int __base asm("r4") = base;
register unsigned long long __n asm("r0") = *n;
register unsigned long long __res asm("r2");
register unsigned int __rem asm(__xh);
asm( __asmeq("%0", __xh)
__asmeq("%1", "r2")
__asmeq("%2", "r0")
__asmeq("%3", "r4")
"bl __do_div64"
: "=r" (__rem), "=r" (__res)
: "r" (__n), "r" (__base)
: "ip", "lr", "cc");
*n = __res;
return __rem;
}

There is no reference to __builtin_constant_p(), could you elaborate?

Also you mentioned there are plenty of cases, maybe there is a more
straightforward one?

In any case it seems this derails a bit from the original topic of the
thread. Shall we take this offline?