Re: [PATCH AUTOSEL 6.0 64/67] sbitmap: fix lockup while swapping

From: Sasha Levin
Date: Thu Oct 13 2022 - 13:56:46 EST


On Wed, Oct 12, 2022 at 06:08:50PM -0700, Hugh Dickins wrote:
On Wed, 12 Oct 2022, Sasha Levin wrote:

From: Hugh Dickins <hughd@xxxxxxxxxx>

[ Upstream commit 30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0 ]

Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
is a big improvement: without it, I had to revert to before commit
040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
to avoid the high system time and freezes which that had introduced.

Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
swapping (kernel builds in low memory) on another: soon locking up in
sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
around with waitqueue_active() but wait_cnt 0 . Here is a backtrace,
showing the common pattern of outer sbitmap_queue_wake_up() interrupted
before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
are idle, in other cases they're spinning for a lock in dd_bio_merge()):

sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
__blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request <
scsi_end_request < scsi_io_completion < scsi_finish_command <
scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq <
__irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt <
_raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up <
sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
__blk_mq_free_request < blk_mq_free_request < dd_bio_merge <
blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio <
__submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct <
submit_bio < __swap_writepage < swap_writepage < pageout <
shrink_folio_list < evict_folios < lru_gen_shrink_lruvec <
shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages <
__alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio <
do_anonymous_page < __handle_mm_fault < handle_mm_fault <
do_user_addr_fault < exc_page_fault < asm_exc_page_fault

See how the process-context sbitmap_queue_wake_up() has been interrupted,
after bringing wait_cnt down to 0 (and in this example, after doing its
wakeups), before advancing wake_index and refilling wake_cnt: an
interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.

I have almost no grasp of all the possible sbitmap races, and their
consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
so it is better if sbq_wake_ptr() skips on to the next ws in that case:
which fixes the lockup and shows no adverse consequence for me.

The check for wait_cnt being 0 is obviously racy, and ultimately can lead
to lost wakeups: for example, when there is only a single waitqueue with
waiters. However, lost wakeups are unlikely to matter in these cases,
and a proper fix requires redesign (and benchmarking) of the batched
wakeup code: so let's plug the hole with this bandaid for now.

Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
Reviewed-by: Jan Kara <jack@xxxxxxx>
Reviewed-by: Keith Busch <kbusch@xxxxxxxxxx>
Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@xxxxxxxxxx
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

Whoa! NAK to this 6.0 backport, and to the 5.19, 5.15, 5.10, 5.4
AUTOSEL backports of the same commit. I never experienced such a
lockup on those releases. Or have I missed announcements of stable
backports of the whole series of 6.1-rc commits to which this one
is a fix? (I hope not.)

Happy to drop it.

I'm happy for my NAK to be overruled by Jens or Jan or Keith,
if they see virtue in this commit, beyond what I'm aware of:
but as it stands, it looks like AUTOSEL out of control again -
it found the word "fix", and found that the commit applies cleanly,
so thinks it must be a good stable addition. Not necessarily so!

I'm a bit confused: the subject of the patch is "fix lockup while
swapping" and the body describes a lockup and that this patch "fixes the
lockup and shows no adverse consequence". What am I missing?

--
Thanks,
Sasha