Re: [PATCH -next RFC v2 3/8] sbitmap: make sure waitqueues are balanced

From: yukuai (C)
Date: Fri Apr 15 2022 - 03:07:48 EST


在 2022/04/15 14:31, Li, Ming 写道:


On 4/8/2022 3:39 PM, Yu Kuai wrote:
Currently, same waitqueue might be woken up continuously:

__sbq_wake_up __sbq_wake_up
sbq_wake_ptr -> assume 0
sbq_wake_ptr -> 0
atomic_dec_return
atomic_dec_return
atomic_cmpxchg -> succeed
atomic_cmpxchg -> failed
return true

__sbq_wake_up
sbq_wake_ptr
atomic_read(&sbq->wake_index) -> still 0
sbq_index_atomic_inc -> inc to 1
if (waitqueue_active(&ws->wait))
if (wake_index != atomic_read(&sbq->wake_index))
atomic_set -> reset from 1 to 0
wake_up_nr -> wake up first waitqueue
// continue to wake up in first waitqueue

What's worse, io hung is possible in theory because wake up might be
missed. For example, 2 * wake_batch tags are put, while only wake_batch
threads are worken:

__sbq_wake_up
atomic_cmpxchg -> reset wait_cnt
__sbq_wake_up -> decrease wait_cnt
...
__sbq_wake_up -> wait_cnt is decreased to 0 again
atomic_cmpxchg
sbq_index_atomic_inc -> increase wake_index
wake_up_nr -> wake up and waitqueue might be empty
sbq_index_atomic_inc -> increase again, one waitqueue is skipped
wake_up_nr -> invalid wake up because old wakequeue might be empty

To fix the problem, refactor to make sure waitqueues will be woken up
one by one, and also choose the next waitqueue by the number of threads
that are waiting to keep waitqueues balanced.
Hi, do you think that updating wake_index before atomic_cmpxchg(ws->wait_cnt) also can solve these two problems?
like this:
Hi,

The first problem is due to sbq_wake_ptr() is using atomic_set() to
update 'wake_index'.

The second problem is due to __sbq_wake_up() is updating 'wait_cnt'
before 'wait_index'.

__sbq_wake_up()
{
....
if (wait_cnt <= 0) {
ret = atomic_cmpxchg(sbq->wake_index, old_wake_index, next_wake_index);
How is the 'next_wake_index' chosen? And the same in sbq_wake_ptr().
if (ret == old_wake_index) {
ret = atomic_cmpxchg(ws->wait_cnt, wait_cnt, wake_batch);

If this failed, just return true with 'wake_index' updated? Then the
caller will call this again, so it seems this can't prevent 'wake_index'
updated multiple times, and 'wait_cnt' in the old 'ws' is not updated.
if (ret == wait_cnt)
wake_up_nr(ws->wait, wake_batch);
}
}
}

Your solution is picking the waitqueue with the largest waiters_cnt as the next one to be waked up, I think that waitqueue is possible to starve.
if lots of threads in a same waitqueue stop waiting before sbq wakes them up, it will cause the waiters_cnt of waitqueue is much less than others, looks like sbq_update_wake_index() would never pick this waitqueue. What do you think? is it possible?

It will be possible if adding threads to waitqueues is not balanced, and
I suppose it's not possible after tag premmption is disabled. However,
instead of chosing the waitqueue with largest waiters_cnt, chosing the
next waitqueue with 'waiters_cnt > 0' might be alternative.

Thanks,
Kuai