Re: [RFC PATCH] x86/hweight: Get rid of the special calling convention
From: Brian Gerst
Date: Wed May 04 2016 - 15:31:16 EST
On Wed, May 4, 2016 at 2:46 PM, Borislav Petkov <bp@xxxxxxx> wrote:
> On Thu, Apr 07, 2016 at 11:43:33AM +0200, Borislav Petkov wrote:
>> I guess we can do something like this:
>>
>> if (likely(static_cpu_has(X86_FEATURE_POPCNT)))
>> asm volatile(POPCNT32
>> : "="REG_OUT (res)
>> : REG_IN (w));
>> else
>> res = __sw_hweight32(w);
>>
>> and get rid of the custom calling convention.
>>
>> Along with some numbers showing that the change doesn't cause any
>> noticeable slowdown...
>
> Ok, here's something which seems to build and boot in kvm.
>
> I like how we don't need the special calling conventions anymore and we
> can actually say "popcnt .." and gcc selects registers.
>
> The include files hackery is kinda nasty but I had to do it because I
> needed to be able to use static_cpu_has() in a header and including
> asm/cpufeature.h pulls in all kinds of nasty dependencies. I'm certainly
> open for better ideas...
>
> ---
> From: Borislav Petkov <bp@xxxxxxx>
> Date: Wed, 4 May 2016 18:52:09 +0200
> Subject: [PATCH] x86/hweight: Get rid of the special calling convention
>
> People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench
> into kcov, lto, etc, experimentation.
>
> And its not like we absolutely need it so let's get rid of it and
> streamline it a bit. I had to do some carving out of facilities so
> that the include hell doesn't swallow me but other than that, the new
> __arch_hweight*() versions look much more palatable and gcc is more free
> to select registers than us hardcoding them in the insn bytes.
>
> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
> ---
> arch/x86/Kconfig | 5 --
> arch/x86/include/asm/arch_hweight.h | 43 ++++---------
> arch/x86/include/asm/cpufeature.h | 112 +-------------------------------
> arch/x86/include/asm/cpuinfo.h | 65 +++++++++++++++++++
> arch/x86/include/asm/processor.h | 63 +-----------------
> arch/x86/include/asm/static_cpu_has.h | 116 ++++++++++++++++++++++++++++++++++
> lib/Makefile | 5 --
> 7 files changed, 197 insertions(+), 212 deletions(-)
> create mode 100644 arch/x86/include/asm/cpuinfo.h
> create mode 100644 arch/x86/include/asm/static_cpu_has.h
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 7bb15747fea2..79e0bcd61cb1 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -292,11 +292,6 @@ config X86_32_LAZY_GS
> def_bool y
> depends on X86_32 && !CC_STACKPROTECTOR
>
> -config ARCH_HWEIGHT_CFLAGS
> - string
> - default "-fcall-saved-ecx -fcall-saved-edx" if X86_32
> - default "-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" if X86_64
> -
> config ARCH_SUPPORTS_UPROBES
> def_bool y
>
> diff --git a/arch/x86/include/asm/arch_hweight.h b/arch/x86/include/asm/arch_hweight.h
> index 02e799fa43d1..6c1a2d500c4c 100644
> --- a/arch/x86/include/asm/arch_hweight.h
> +++ b/arch/x86/include/asm/arch_hweight.h
> @@ -2,36 +2,18 @@
> #define _ASM_X86_HWEIGHT_H
>
> #include <asm/cpufeatures.h>
> +#include <asm/static_cpu_has.h>
>
> -#ifdef CONFIG_64BIT
> -/* popcnt %edi, %eax -- redundant REX prefix for alignment */
> -#define POPCNT32 ".byte 0xf3,0x40,0x0f,0xb8,0xc7"
> -/* popcnt %rdi, %rax */
> -#define POPCNT64 ".byte 0xf3,0x48,0x0f,0xb8,0xc7"
> -#define REG_IN "D"
> -#define REG_OUT "a"
> -#else
> -/* popcnt %eax, %eax */
> -#define POPCNT32 ".byte 0xf3,0x0f,0xb8,0xc0"
> -#define REG_IN "a"
> -#define REG_OUT "a"
> -#endif
> -
> -/*
> - * __sw_hweightXX are called from within the alternatives below
> - * and callee-clobbered registers need to be taken care of. See
> - * ARCH_HWEIGHT_CFLAGS in <arch/x86/Kconfig> for the respective
> - * compiler switches.
> - */
> static __always_inline unsigned int __arch_hweight32(unsigned int w)
> {
> - unsigned int res = 0;
> + unsigned int res;
>
> - asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, X86_FEATURE_POPCNT)
> - : "="REG_OUT (res)
> - : REG_IN (w));
> + if (likely(static_cpu_has(X86_FEATURE_POPCNT))) {
> + asm volatile("popcnt %[w], %[res]" : [res] "=r" (res) : [w] "r" (w));
Do all supported versions of the assembler know of the popcnt
instruction? That's why is was open coded before. The problem is
Intel and AMD are constantly adding new instructions and it's a long
cycle for the user's assembler to get updated.
--
Brian Gerst