Hi,
在 2025/06/12 15:45, Wang Jinchao 写道:
在 2025/6/12 15:31, Yu Kuai 写道:
Hi,The element size is different between the old pool and the new pool.
在 2025/06/11 16:55, Wang Jinchao 写道:
In the raid1_reshape function, newpool is
allocated on the stack and assigned to conf->r1bio_pool.
This results in conf->r1bio_pool.wait.head pointing
to a stack address.
Accessing this address later can lead to a kernel panic.
Example access path:
raid1_reshape()
{
// newpool is on the stack
mempool_t newpool, oldpool;
// initialize newpool.wait.head to stack address
mempool_init(&newpool, ...);
conf->r1bio_pool = newpool;
}
raid1_read_request() or raid1_write_request()
{
alloc_r1bio()
{
mempool_alloc()
{
// if pool->alloc fails
remove_element()
{
--pool->curr_nr;
}
}
}
}
mempool_free()
{
if (pool->curr_nr < pool->min_nr) {
// pool->wait.head is a stack address
// wake_up() will try to access this invalid address
// which leads to a kernel panic
return;
wake_up(&pool->wait);
}
}
Fix:
The solution is to avoid using a stack-based newpool.
Instead, directly initialize conf->r1bio_pool.
Signed-off-by: Wang Jinchao <wangjinchao600@xxxxxxxxx>
---
v1 -> v2:
- change subject
- use mempool_init(&conf->r1bio_pool) instead of reinitializing the list on stack
---
drivers/md/raid1.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 19c5a0ce5a40..f2436262092a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3366,7 +3366,7 @@ static int raid1_reshape(struct mddev *mddev)
* At the same time, we "pack" the devices so that all the missing
* devices have the higher raid_disk numbers.
*/
- mempool_t newpool, oldpool;
+ mempool_t oldpool;
struct pool_info *newpoolinfo;
struct raid1_info *newmirrors;
struct r1conf *conf = mddev->private;
@@ -3375,9 +3375,6 @@ static int raid1_reshape(struct mddev *mddev)
int d, d2;
int ret;
- memset(&newpool, 0, sizeof(newpool));
- memset(&oldpool, 0, sizeof(oldpool));
-
/* Cannot change chunk_size, layout, or level */
if (mddev->chunk_sectors != mddev->new_chunk_sectors ||
mddev->layout != mddev->new_layout ||
@@ -3408,26 +3405,33 @@ static int raid1_reshape(struct mddev *mddev)
newpoolinfo->mddev = mddev;
newpoolinfo->raid_disks = raid_disks * 2;
- ret = mempool_init(&newpool, NR_RAID_BIOS, r1bio_pool_alloc,
- rbio_pool_free, newpoolinfo);
- if (ret) {
- kfree(newpoolinfo);
- return ret;
- }
newmirrors = kzalloc(array3_size(sizeof(struct raid1_info),
- raid_disks, 2),
- GFP_KERNEL);
+ raid_disks, 2),
+ GFP_KERNEL);
if (!newmirrors) {
kfree(newpoolinfo);
- mempool_exit(&newpool);
return -ENOMEM;
}
+ /* stop everything before switching the pool */
freeze_array(conf, 0);
- /* ok, everything is stopped */
+ /* backup old pool in case restore is needed */
oldpool = conf->r1bio_pool;
- conf->r1bio_pool = newpool;
+
+ ret = mempool_init(&conf->r1bio_pool, NR_RAID_BIOS, r1bio_pool_alloc,
+ rbio_pool_free, newpoolinfo);
+ if (ret) {
+ kfree(newpoolinfo);
+ kfree(newmirrors);
+ mempool_exit(&conf->r1bio_pool);
+ /* restore the old pool */
+ conf->r1bio_pool = oldpool;
+ unfreeze_array(conf);
+ pr_err("md/raid1:%s: cannot allocate r1bio_pool for reshape\n",
+ mdname(mddev));
+ return ret;
+ }
for (d = d2 = 0; d < conf->raid_disks; d++) {
struct md_rdev *rdev = conf->mirrors[d].rdev;
Any specific reason not to use mempool_resize() and krealloc() here?
In the case if new raid_disks is greater than the old one.
mempool_resize only resizes the pool size (i.e., the number of elements in pool->elements), but does not handle changes in element size, which occurs in raid1_reshape.
Another reason may be to avoid modifying the old pool directly — in case initializing the new pool fails, the old one remains usable.
If we modify the old pool directly and the operation fails, not only will the reshaped RAID be unusable, but the original RAID may also be corrupted.
Yes, you're right, thanks for the explanation.
I feel like raid1_reshape() need better coding, anyway, your fix looks
good.
Reviewed-by: Yu Kuai <yukuai3@xxxxxxxxxx>
Thanks,
Kuai
.