Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

From: Jan Kara
Date: Mon Jul 21 2025 - 07:07:52 EST


On Mon 14-07-25 21:03:25, Baokun Li wrote:
> While traversing the list, holding a spin_lock prevents load_buddy, making
> direct use of ext4_try_lock_group impossible. This can lead to a bouncing
> scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
> fails, forcing the list traversal to repeatedly restart from grp_A.
>
> In contrast, linear traversal directly uses ext4_try_lock_group(),
> avoiding this bouncing. Therefore, we need a lockless, ordered traversal
> to achieve linear-like efficiency.
>
> Therefore, this commit converts both average fragment size lists and
> largest free order lists into ordered xarrays.
>
> In an xarray, the index represents the block group number and the value
> holds the block group information; a non-empty value indicates the block
> group's presence.
>
> While insertion and deletion complexity remain O(1), lookup complexity
> changes from O(1) to O(nlogn), which may slightly reduce single-threaded
> performance.
>
> Additionally, xarray insertions might fail, potentially due to memory
> allocation issues. However, since we have linear traversal as a fallback,
> this isn't a major problem. Therefore, we've only added a warning message
> for insertion failures here.
>
> A helper function ext4_mb_find_good_group_xarray() is added to find good
> groups in the specified xarray starting at the specified position start,
> and when it reaches ngroups-1, it wraps around to 0 and then to start-1.
> This ensures an ordered traversal within the xarray.
>
> Performance test results are as follows: Single-process operations
> on an empty disk show negligible impact, while multi-process workloads
> demonstrate a noticeable performance gain.
>
> |CPU: Kunpeng 920 | P80 | P1 |
> |Memory: 512GB |------------------------|-------------------------|
> |960GB SSD (0.5GB/s)| base | patched | base | patched |
> |-------------------|-------|----------------|--------|----------------|
> |mb_optimize_scan=0 | 20097 | 19555 (-2.6%) | 316141 | 315636 (-0.2%) |
> |mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) |
>
> |CPU: AMD 9654 * 2 | P96 | P1 |
> |Memory: 1536GB |------------------------|-------------------------|
> |960GB SSD (1GB/s) | base | patched | base | patched |
> |-------------------|-------|----------------|--------|----------------|
> |mb_optimize_scan=0 | 53603 | 53192 (-0.7%) | 214243 | 212678 (-0.7%) |
> |mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) |
>
> Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>

The patch looks good and the results are nice. I've just noticed two typos:

> +static inline void ext4_mb_avg_fragment_size_destory(struct ext4_sb_info *sbi)
^^^ destroy


> +{
> + for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
> + xa_destroy(&sbi->s_mb_avg_fragment_size[i]);
> + kfree(sbi->s_mb_avg_fragment_size);
> +}
> +
> +static inline void ext4_mb_largest_free_orders_destory(struct ext4_sb_info *sbi)
^^^ destroy

> +{
> + for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
> + xa_destroy(&sbi->s_mb_largest_free_orders[i]);
> + kfree(sbi->s_mb_largest_free_orders);
> +}

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR