Re: [PATCH] x86/ACPI/boot: Improve __acpi_acquire_global_lock

From: Rafael J. Wysocki
Date: Mon Mar 20 2023 - 13:19:06 EST


On Mon, Feb 27, 2023 at 5:48 PM Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
>
> Improve __acpi_acquire_global_lock by using a temporary variable.
> This enables compiler to perform if-conversion and improves generated
> code from:
>
> ...
> 72a: d1 ea shr %edx
> 72c: 83 e1 fc and $0xfffffffc,%ecx
> 72f: 83 e2 01 and $0x1,%edx
> 732: 09 ca or %ecx,%edx
> 734: 83 c2 02 add $0x2,%edx
> 737: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> 73b: 75 e9 jne 726 <__acpi_acquire_global_lock+0x6>
> 73d: 83 e2 03 and $0x3,%edx
> 740: 31 c0 xor %eax,%eax
> 742: 83 fa 03 cmp $0x3,%edx
> 745: 0f 95 c0 setne %al
> 748: f7 d8 neg %eax
>
> to:
>
> ...
> 72a: d1 e9 shr %ecx
> 72c: 83 e2 fc and $0xfffffffc,%edx
> 72f: 83 e1 01 and $0x1,%ecx
> 732: 09 ca or %ecx,%edx
> 734: 83 c2 02 add $0x2,%edx
> 737: f0 0f b1 17 lock cmpxchg %edx,(%rdi)
> 73b: 75 e9 jne 726 <__acpi_acquire_global_lock+0x6>
> 73d: 8d 41 ff lea -0x1(%rcx),%eax
>
> BTW: the compiler could generate:
>
> lea 0x2(%rcx,%rdx,1),%edx
>
> instead of:
>
> or %ecx,%edx
> add $0x2,%edx
>
> but unwated conversion from add to or when bits are known to be zero
> prevents this improvement. This is GCC PR108477.
>
> No functional change intended.
>
> Signed-off-by: Uros Bizjak <ubizjak@xxxxxxxxx>
> Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
> Cc: Len Brown <lenb@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> ---
> arch/x86/kernel/acpi/boot.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 1c38174b5f01..577403c69983 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1853,13 +1853,14 @@ early_param("acpi_sci", setup_acpi_sci);
>
> int __acpi_acquire_global_lock(unsigned int *lock)
> {
> - unsigned int old, new;
> + unsigned int old, new, val;
>
> old = READ_ONCE(*lock);
> do {
> - new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1));
> + val = (old >> 1) & 0x1;
> + new = (old & ~0x3) + 2 + val;
> } while (!try_cmpxchg(lock, &old, new));
> - return ((new & 0x3) < 3) ? -1 : 0;
> + return val ? 0 : -1;

I would prefer

if (val)
return -1;

return 0;

> }
>
> int __acpi_release_global_lock(unsigned int *lock)
> --

Otherwise I agree that this change would be an improvement.

>