Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected

From: Dan Moulding
Date: Tue Mar 19 2024 - 10:16:28 EST


> Thanks a lot for the testing! Can you also give following patch a try?
> It removes the change to blk_plug, because Dan and Song are worried
> about performance degradation, so we need to verify the performance
> before consider that patch.
>
> Anyway, I think following patch can fix this problem as well.
>
> Thanks,
> Kuai
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 3ad5f3c7f91e..ae8665be9940 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6728,6 +6728,9 @@ static void raid5d(struct md_thread *thread)
> int batch_size, released;
> unsigned int offset;
>
> + if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
> + goto skip;
> +
> released = release_stripe_list(conf,
> conf->temp_inactive_list);
> if (released)
> clear_bit(R5_DID_ALLOC, &conf->cache_state);
> @@ -6766,6 +6769,7 @@ static void raid5d(struct md_thread *thread)
> spin_lock_irq(&conf->device_lock);
> }
> }
> +skip:
> pr_debug("%d stripes handled\n", handled);
>
> spin_unlock_irq(&conf->device_lock);

Yes, this patch also seems to work. I cannot reproduce the problem on
6.8-rc7 or 6.8.1 with just this one applied.

Cheers!

-- Dan