Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomicreaders to prevent CPU offline

From: Srivatsa S. Bhat
Date: Thu Dec 06 2012 - 14:38:04 EST


On 12/07/2012 12:58 AM, Steven Rostedt wrote:
> On Fri, 2012-12-07 at 00:18 +0530, Srivatsa S. Bhat wrote:
>> On 12/06/2012 09:48 PM, Oleg Nesterov wrote:
>>> On 12/06, Srivatsa S. Bhat wrote:
>>>>
>>>> +void get_online_cpus_atomic(void)
>>>> +{
>>>> + int c, old;
>>>> +
>>>> + preempt_disable();
>>>> + read_lock(&hotplug_rwlock);
>>>
>>> Confused... Why it also takes hotplug_rwlock?
>>
>> To avoid ABBA deadlocks.
>>
>> hotplug_rwlock was meant for the "light" readers.
>> The atomic counters were meant for the "heavy/full" readers.
>> I wanted them to be able to nest in any manner they wanted,
>> such as:
>>
>> Full inside light:
>>
>> get_online_cpus_atomic_light()
>> ...
>> get_online_cpus_atomic_full()
>> ...
>> put_online_cpus_atomic_full()
>> ...
>> put_online_cpus_atomic_light()
>>
>> Or, light inside full:
>>
>> get_online_cpus_atomic_full()
>> ...
>> get_online_cpus_atomic_light()
>> ...
>> put_online_cpus_atomic_light()
>> ...
>> put_online_cpus_atomic_full()
>>
>> To allow this, I made the two sets of APIs take the locks
>> in the same order internally.
>>
>> (I had some more description of this logic in the changelog
>> of 2/10; the only difference there is that instead of atomic
>> counters, I used rwlocks for the full-readers as well.
>> https://lkml.org/lkml/2012/12/5/320)
>>
>
> You know reader locks can deadlock with each other, right? And this
> isn't caught be lockdep yet. This is because rwlocks have been made to
> be fair with writers. Before writers could be starved if a CPU always
> let a reader in. Now if a writer is waiting, a reader will block behind
> the writer. But this has introduced new issues with the kernel as
> follows:
>
>
> CPU0 CPU1 CPU2 CPU3
> ---- ---- ---- ----
> read_lock(A);
> read_lock(B)
> write_lock(A) <- block
> write_lock(B) <- block
> read_lock(B) <-block
>
> read_lock(A) <- block
>
> DEADLOCK!
>

The root-cause of this deadlock is again lock-ordering mismatch right?
CPU0 takes locks in order A, B
CPU1 takes locks in order B, A

And the writer facilitates in actually getting deadlocked.

I avoid this in this patchset by always taking the locks in the same
order. So we won't be deadlocking like this.

Regards,
Srivatsa S. Bhat


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/