Re: [PATCH 01/51] CPU hotplug: Provide lockless versions of callback registration functions

From: Srivatsa S. Bhat
Date: Tue Feb 11 2014 - 14:14:18 EST


On 02/11/2014 10:45 PM, Oleg Nesterov wrote:
> On 02/11, Srivatsa S. Bhat wrote:
>>
>> +static DECLARE_RWSEM(cpu_hotplug_rwsem);
>> +
>> +void cpu_notifier_register_begin(void)
>> +{
>> + down_read(&cpu_hotplug_rwsem);
>> +}
>> +
>> +void cpu_notifier_register_end(void)
>> +{
>> + up_read(&cpu_hotplug_rwsem);
>> +}
>> +
>> /* Serializes the updates to cpu_online_mask, cpu_present_mask */
>> static DEFINE_MUTEX(cpu_add_remove_lock);
>>
>> @@ -32,12 +45,14 @@ static DEFINE_MUTEX(cpu_add_remove_lock);
>> */
>> void cpu_maps_update_begin(void)
>> {
>> + down_write(&cpu_hotplug_rwsem);
>> mutex_lock(&cpu_add_remove_lock);
>> }
>>
>> void cpu_maps_update_done(void)
>> {
>> mutex_unlock(&cpu_add_remove_lock);
>> + up_write(&cpu_hotplug_rwsem);
>> }
>
> I am a bit confused... If we do this, why we can't simply turn
> cpu_add_remove_lock into rw_semaphore?
>

Short answer: Being a mutex, cpu_add_remove_lock ensures that the updates to
the cpu notifier chain get serialized. If we make that an rw-semaphore, then
the notifier chain mutations (during callback registration) will run in
parallel, wreaking havoc.

Long answer: There are two distinct phases in the critical section involving
the callback registration - one that should run in parallel with other
readers (other such critical sections) and the other one which should run
serially, as depicted below.

cpu_notifier_register_begin(); | Run in parallel
| with similar phases
for_each_online_cpu(cpu) | from other subsystems.
init_cpu(cpu); |

/* Updates the cpu notifier chain. */
register_cpu_notifier(&foobar_cpu_notifier); ||| -- Must run serially

cpu_notifier_register_done();


So, for the first part, we can use an rw-semaphore, to allow the init
routines of various subsystems to run in parallel. For the second part,
we need strict mutual exclusion; so we can use the cpu_add_remove_lock
mutex as it is. But it so happens that the length of the critical section
for both these locks are exactly the same on the hotplug writer side - they
both need to cover the full hotplug code, including the CPU_POST_DEAD stage.

I do agree that this approach introduces yet another lock in the hotplug
path. However, we can nicely abstract it into APIs that the rest of the
subsystems can call (as shown above), without needing to know the internal
lock ordering etc.

Thoughts?

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/