Re: [PATCH] percpu_ref: call wake_up_all() after percpu_ref_put() completes

From: Qi Zheng
Date: Fri Apr 08 2022 - 02:28:54 EST




On 2022/4/8 1:57 PM, Dennis Zhou wrote:
On Fri, Apr 08, 2022 at 12:14:54PM +0800, Qi Zheng wrote:


On 2022/4/8 12:10 PM, Andrew Morton wrote:
On Fri, 8 Apr 2022 12:06:20 +0800 Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:



Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>

Are any users affected by this? If so, I think a Fixes tag
is necessary.

Looks all current users(blk_pre_runtime_suspend() and set_in_sync()) are
affected by this.

I see that this patch has been merged into the mm tree, can Andrew help
me add the following Fixes tag?

Andrew is helpful ;)

Do you see reasons why we should backport this into -stable trees?
It's 8 years old, so my uninformed guess is "no"?

Hmm, although the commit 490c79a65708 add wake_up_all(), it is no
problem for the usage at that time, maybe the correct Fixes tag is the
following:

Fixes: 210f7cdcf088 ("percpu-refcount: support synchronous switch to
atomic mode.")

But in fact, there is no problem with it, but all current users expect
the refcount is stable after percpu_ref_switch_to_atomic_sync() returns.

I have no idea as which Fixes tag to add.

Well the solution to that problem is to add cc:stable and let Greg
figure it out ;)

The more serious question is "should we backport this". What is the
end-user-visible impact of the bug? Do our users need the fix or not?

The impact on the current user is that it is possible to miss an opportunity
to reach 0 due to the case B in the commit message:


Did you find this bug through code inspection or was the finding
motivated by a production incident?

I find this bug through code inspection, because I want to use
percpu_ref_switch_to_atomic_sync()+percpu_ref_is_zero() to do something
similar.


The usage in block/blk-pm.c looks problematic, but I'm guessing this is
a really, really hard bug to trigger. You need to have the wake up be

Agree, I manually added the delay in wake_up_all() and percpu_ref_put()
to trigger the case B.

faster than an atomic decrement. The q_usage_counter allows reinit so it
skips the __percpu_ref_exit() call.

Thanks,
Dennis

--
Thanks,
Qi