Re: [PATCH] percpu_rwsem: fix missed wakeup due to reordering of load

From: Prateek Sood
Date: Wed Dec 12 2018 - 09:26:19 EST


On 12/04/2018 01:06 AM, Prateek Sood wrote:
> On 12/03/2018 12:08 PM, Davidlohr Bueso wrote:
>> On 2018-11-30 07:10, Prateek Sood wrote:
>>> In a scenario where cpu_hotplug_lock percpu_rw_semaphore is already
>>> acquired for read operation by P1 using percpu_down_read().
>>>
>>> Now we have P1 in the path of releaseing the cpu_hotplug_lock and P2
>>> is in the process of acquiring cpu_hotplug_lock.
>>>
>>> P1ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ P2
>>> percpu_up_read() pathÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ percpu_down_write() path
>>>
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rcu_sync_enter() //gp_state=GP_PASSED
>>>
>>> rcu_sync_is_idle() //returns falseÂÂÂÂÂÂÂ down_write(rw_sem)
>>>
>>> __percpu_up_read()
>>>
>>> [L] task = rcu_dereference(w->task) //NULL
>>>
>>> smp_rmb()ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ [S] w->task = current
>>>
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ smp_mb()
>>>
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ [L] readers_active_check() //fails
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ schedule()
>>>
>>> [S] __this_cpu_dec(read_count)
>>>
>>> Since load of task can result in NULL. This can lead to missed wakeup
>>> in rcuwait_wake_up(). Above sequence violated the following constraint
>>> in rcuwait_wake_up():
>>>
>>> ÂÂÂÂ WAITÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ WAKE
>>> [S] tsk = currentÂÂÂÂÂ [S] cond = true
>>> MB (A)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ MB (B)
>>> [L] condÂÂÂÂÂÂÂÂÂ [L] tsk
>>>
>>
>> Hmm yeah we don't want rcu_wake_up() to get hoisted over the __this_cpu_dec(read_count). The smp_rmb() does not make sense to me here in the first place. Did you run into this scenario by code inspection or you actually it the issue?
>>
>> Thanks,
>> Davidlohr
>
> I have checked one issue where it seems that cpu hotplug code
> path is not able to get cpu_hotplug_lock in write mode and there
> is a reader pending for cpu hotplug path to release
> percpu_rw_semaphore->rwsem to acquire cpu_hotplug_lock.
> This caused a deadlock.
>
> From code inspection also it seems to be not adhering to arm64
> smp_rmb() constraint of load/load-store ordering guarantee.
>
>
> Thanks,
> Prateek
>

Folks,

Please confirm if the suspicion of smp_rmb is correct.
IMO, it should be smp_mb() translating to dmb ish.


Thanks
Prateek

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project