Re: [clang] stack protector and f1f029c7bf

From: Tom Stellard
Date: Fri May 25 2018 - 13:35:20 EST


On 05/25/2018 10:31 AM, Nick Desaulniers wrote:
> On Fri, May 25, 2018 at 9:53 AM <hpa@xxxxxxxxx> wrote:
>> On May 25, 2018 9:46:42 AM PDT, Nick Desaulniers <ndesaulniers@xxxxxxxxxx>
> wrote:
>>> On Fri, May 25, 2018 at 9:33 AM <hpa@xxxxxxxxx> wrote:
>>>> On May 25, 2018 9:27:40 AM PDT, Nick Desaulniers
>>> <ndesaulniers@xxxxxxxxxx> wrote:
>>> When you say
>>>
>>>> It still should be available as as inline, however, but now "extern
>>> inline".
>>>
>>> Am I understanding correctly that native_save_fl should be inlined into
>>> all
>>> call sites (modulo the problematic pv_irq_ops.save_fl case)? Because
>>> for
>>> these two assembly implementations, it's not, but maybe there's
>>> something
>>> missing in my implementation?
>
>> Yes, that's what "extern inline" means. Maybe it needs a must inline
> annotation, but that's really messed up.
>

What about doing something like suggested here:
https://bugs.llvm.org/show_bug.cgi?id=37512#c17

This would keep the definition in C and make it easier for compilers
to inline.

-Tom

> I don't think it's possible to inline a function from an external
> translation unit without something like LTO.
>
> If I move the implementation of native_save_fl() to a separate .c (with out
> of line assembly) or .S, neither clang nor gcc will inline that assembly to
> any call sites, whether the declaration of native_save_fl() looks like:
>
> extern inline unsigned long native_save_fl(void);
>
> or
>
> __attribute__((always_inline))
>
> extern inline unsigned long native_save_fl(void);
>
> I think an external copy is the best approach for the paravirt code:
>
> diff --git a/arch/x86/kernel/irqflags.c b/arch/x86/kernel/irqflags.c
> new file mode 100644
> index 000000000000..e173ba8bee7b
> --- /dev/null
> +++ b/arch/x86/kernel/irqflags.c
> @@ -0,0 +1,24 @@
> +#include <asm/asm.h>
> +
> +extern unsigned long native_save_fl_no_stack_protector(void);
> +extern void native_restore_fl_no_stack_protector(unsigned long flags);
> +
> +asm(
> +".pushsection .text;"
> +".global native_save_fl_no_stack_protector;"
> +".type native_save_fl_no_stack_protector, @function;"
> +"native_save_fl_no_stack_protector:"
> +"pushf;"
> +"pop %" _ASM_AX ";"
> +"ret;"
> +".popsection");
> +
> +asm(
> +".pushsection .text;"
> +".global native_restore_fl_no_stack_protector;"
> +".type native_restore_fl_no_stack_protector, @function;"
> +"native_restore_fl_no_stack_protector:"
> +"push %" _ASM_DI ";"
> +"popf;"
> +"ret;"
> +".popsection");
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 02d6f5cf4e70..8824d01c0c35 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -61,6 +61,7 @@ obj-y += alternative.o i8253.o
> hw_breakpoint.o
> obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
> obj-y += pci-iommu_table.o
> obj-y += resource.o
> +obj-y += irqflags.o
>
> obj-y += process.o
> obj-y += fpu/
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -322,9 +322,12 @@ struct pv_time_ops pv_time_ops = {
> .steal_clock = native_steal_clock,
> };
>
> +extern unsigned long native_save_fl_no_stack_protector(void);
> +extern void native_restore_fl_no_stack_protector(unsigned long flags);
> +
> __visible struct pv_irq_ops pv_irq_ops = {
> - .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
> - .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
> + .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl_no_stack_protector),
> + .restore_fl =
> __PV_IS_CALLEE_SAVE(native_restore_fl_no_stack_protector),
> .irq_disable = __PV_IS_CALLEE_SAVE(native_irq_disable),
> .irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
> .safe_halt = native_safe_halt,
>
> Thoughts?
>