Re: [PATCH] cpu/hotplug: Cache number of online CPUs

From: Mathieu Desnoyers
Date: Thu Jul 04 2019 - 18:00:59 EST


----- On Jul 4, 2019, at 5:10 PM, Thomas Gleixner tglx@xxxxxxxxxxxxx wrote:

> On Thu, 4 Jul 2019, Mathieu Desnoyers wrote:
>
>> ----- On Jul 4, 2019, at 4:42 PM, Thomas Gleixner tglx@xxxxxxxxxxxxx wrote:
>>
>> > Revaluating the bitmap wheight of the online cpus bitmap in every
>> > invocation of num_online_cpus() over and over is a pretty useless
>> > exercise. Especially when num_online_cpus() is used in code pathes like the
>> > IPI delivery of x86 or the membarrier code.
>> >
>> > Cache the number of online CPUs in the core and just return the cached
>> > variable.
>> >
>> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> > ---
>> > include/linux/cpumask.h | 16 +++++++---------
>> > kernel/cpu.c | 16 ++++++++++++++++
>> > 2 files changed, 23 insertions(+), 9 deletions(-)
>> >
>> > --- a/include/linux/cpumask.h
>> > +++ b/include/linux/cpumask.h
>> > @@ -95,8 +95,13 @@ extern struct cpumask __cpu_active_mask;
>> > #define cpu_present_mask ((const struct cpumask *)&__cpu_present_mask)
>> > #define cpu_active_mask ((const struct cpumask *)&__cpu_active_mask)
>> >
>> > +extern unsigned int __num_online_cpus;
>>
>> [...]
>>
>> > +
>> > +void set_cpu_online(unsigned int cpu, bool online)
>> > +{
>> > + lockdep_assert_cpus_held();
>>
>> I don't think it is required that the cpu_hotplug lock is held
>> when reading __num_online_cpus, right ?
>
> Errm, that's the update function. And this is better called from a hotplug
> lock held region and not from some random crappy code.

Sure, this is fine to assume this lock is held for the update.
It's the read-side I'm worried about (which does not hold the lock).

>
>> I would have expected the increment/decrement below to be performed
>> with a WRITE_ONCE(), and use a READ_ONCE() when reading the current
>> value.
>
> What for?
>
> num_online_cpus() is racy today vs. CPU hotplug operations as
> long as you don't hold the hotplug lock.

Fair point, AFAIU none of the loads performed within num_online_cpus()
seem to rely on atomic nor volatile accesses. So not using a volatile
access to load the cached value should not introduce any regression.

I'm concerned that some code may rely on re-fetching of the cached
value between iterations of a loop. The lack of READ_ONCE() would
let the compiler keep a lifted load within a register and never
re-fetch, unless there is a cpu_relax() or a barrier() within the
loop.

Thoughts ?

Thanks,

Mathieu


>
> Thanks,
>
> tglx

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com