Re: [PATCH v6 05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()

From: Kemeng Shi
Date: Tue Sep 12 2023 - 03:02:21 EST




on 9/4/2023 11:00 AM, Kemeng Shi wrote:
>
>
> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>> Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes:
>>
>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>> ^^^ update.
>>
>>>
>>> Separated freeing is safe with concurrent allocation as long as:
>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>> Then freed block will only be available to allocation when both buddy
>>> bitmap and block bitmap are updated by freeing.
>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>
>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>> bitmap right. Continue below...
>>
>>>
>>> Separated freeing has no race with generate_buddy as:
>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>> initialization of the buddy page will be executed concurrently until
>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx>
>>> ---
>>> fs/ext4/mballoc.c | 18 ++++++++----------
>>> 1 file changed, 8 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index e650eac22237..01ad36a1cc96 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>> if (err)
>>> goto error_return;
>>
>>
>> Let me add the a piece of code before the added changes and continue...
>>
>> err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>> GFP_NOFS|__GFP_NOFAIL);
>> if (err)
>> goto error_return;
>>>
>>> + ext4_lock_group(sb, block_group);
>>> + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>> + ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>> + ext4_free_group_clusters_set(sb, gdp, ret);
>>> + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>> + ext4_group_desc_csum_set(sb, block_group, gdp);
>>> + ext4_unlock_group(sb, block_group);
>>> +
>>
>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>> freeing on-disk bitmap blocks? Can you explain why please?
>> At least it is not very clear to me that why do we need
>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>> needed then I think we should separate out bitmap freeing logic and
>> buddy freeing logic even further.
> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
> it before bit clearing for catching error and aborting mark early
> to reduce functional change.
Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
Just wonder is this good to you. Like to hear any advice. Thanks!
>>
>> Thoughts?
>>
>>> /*
>>> * We need to make sure we don't reuse the freed block until after the
>>> * transaction is committed. We make an exception if the inode is to be
>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>> new_entry->efd_tid = handle->h_transaction->t_tid;
>>>
>>> ext4_lock_group(sb, block_group);
>>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>> ext4_mb_free_metadata(handle, &e4b, new_entry);
>>> } else {
>>> - /* need to update group_info->bb_free and bitmap
>>> - * with group lock held. generate_buddy look at
>>> - * them with group lock_held
>>> - */
>>> if (test_opt(sb, DISCARD)) {
>>> err = ext4_issue_discard(sb, block_group, bit,
>>> count_clusters, NULL);
>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>> EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>
>>> ext4_lock_group(sb, block_group);
>>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>> mb_free_blocks(inode, &e4b, bit, count_clusters);
>>> }
>>>
>>> - ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>> - ext4_free_group_clusters_set(sb, gdp, ret);
>>> - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>> - ext4_group_desc_csum_set(sb, block_group, gdp);
>>> ext4_unlock_group(sb, block_group);
>>>
>>> if (sbi->s_log_groups_per_flex) {
>>
>>
>> Adding piece of code here...
>>
>> if (sbi->s_log_groups_per_flex) {
>> ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>> atomic64_add(count_clusters,
>> &sbi_array_rcu_deref(sbi, s_flex_groups,
>> flex_group)->free_clusters);
>> }
>>
>> <...>
>>
>> /* We dirtied the bitmap block */
>> BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>> err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>> /* And the group descriptor block */
>> BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>> ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>> if (!err)
>> err = ret;
>>
>> I was thinking even this can go around bitmap freeing logic above. So
>> the next patch becomes very clear that all of the bitmap freeing logic
>> is just simply moved into ext4_mb_mark_context() function.
>>
>> -ritesh
>>
>
>