RE: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

From: David Laight
Date: Fri May 03 2024 - 12:23:15 EST


From: Waiman Long
> Sent: 03 May 2024 17:00
> To: David Laight <David.Laight@xxxxxxxxxx>; 'linux-kernel@xxxxxxxxxxxxxxx' <linux-
> kernel@xxxxxxxxxxxxxxx>; 'peterz@xxxxxxxxxxxxx' <peterz@xxxxxxxxxxxxx>
> Cc: 'mingo@xxxxxxxxxx' <mingo@xxxxxxxxxx>; 'will@xxxxxxxxxx' <will@xxxxxxxxxx>; 'boqun.feng@xxxxxxxxx'
> <boqun.feng@xxxxxxxxx>; 'Linus Torvalds' <torvalds@xxxxxxxxxxxxxxxxxxxx>; 'virtualization@lists.linux-
> foundation.org' <virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx>; 'Zeng Heng' <zengheng4@xxxxxxxxxx>
> Subject: Re: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().
>
>
> On 12/31/23 23:14, Waiman Long wrote:
> >
> > On 12/31/23 16:55, David Laight wrote:
> >> per_cpu_ptr() indexes __per_cpu_offset[] with the cpu number.
> >> This requires the cpu number be 64bit.
> >> However the value is osq_lock() comes from a 32bit xchg() and there
> >> isn't a way of telling gcc the high bits are zero (they are) so
> >> there will always be an instruction to clear the high bits.
> >>
> >> The cpu number is also offset by one (to make the initialiser 0)
> >> It seems to be impossible to get gcc to convert
> >> __per_cpu_offset[cpu_p1 - 1]
> >> into (__per_cpu_offset - 1)[cpu_p1] (transferring the offset to the
> >> address).
> >>
> >> Converting the cpu number to 32bit unsigned prior to the decrement means
> >> that gcc knows the decrement has set the high bits to zero and doesn't
> >> add a register-register move (or cltq) to zero/sign extend the value.
> >>
> >> Not massive but saves two instructions.
> >>
> >> Signed-off-by: David Laight <david.laight@xxxxxxxxxx>
> >> ---
> >>   kernel/locking/osq_lock.c | 6 ++----
> >>   1 file changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> >> index 35bb99e96697..37a4fa872989 100644
> >> --- a/kernel/locking/osq_lock.c
> >> +++ b/kernel/locking/osq_lock.c
> >> @@ -29,11 +29,9 @@ static inline int encode_cpu(int cpu_nr)
> >>       return cpu_nr + 1;
> >>   }
> >>   -static inline struct optimistic_spin_node *decode_cpu(int
> >> encoded_cpu_val)
> >> +static inline struct optimistic_spin_node *decode_cpu(unsigned int
> >> encoded_cpu_val)
> >>   {
> >> -    int cpu_nr = encoded_cpu_val - 1;
> >> -
> >> -    return per_cpu_ptr(&osq_node, cpu_nr);
> >> +    return per_cpu_ptr(&osq_node, encoded_cpu_val - 1);
> >>   }
> >>     /*
> >
> > You really like micro-optimization.
> >
> > Anyway,
> >
> > Reviewed-by: Waiman Long <longman@xxxxxxxxxx>
> >
> David,
>
> Could you respin the series based on the latest upstream code?

Looks like a wet bank holiday weekend.....

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)