Re: [PATCH 1/2] f2fs: Fix mount failure due to SPO after a successful online resize FS

From: Chao Yu
Date: Tue Mar 03 2020 - 07:09:49 EST


Hi Sahitya,

On 2020/3/2 12:39, Sahitya Tummala wrote:
> Hi Chao,
>
> On Fri, Feb 28, 2020 at 04:35:37PM +0800, Chao Yu wrote:
>> Hi Sahitya,
>>
>> Good catch.
>>
>> On 2020/2/27 18:39, Sahitya Tummala wrote:
>>> Even though online resize is successfully done, a SPO immediately
>>> after resize, still causes below error in the next mount.
>>>
>>> [ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
>>> [ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint
>>>
>>> This is because after FS metadata is updated in update_fs_metadata()
>>> if the SBI_IS_DIRTY is not dirty, then CP will not be done to reflect
>>> the new user_block_count.
>>>
>>> Signed-off-by: Sahitya Tummala <stummala@xxxxxxxxxxxxxx>
>>> ---
>>> fs/f2fs/gc.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index a92fa49..a14a75f 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -1577,6 +1577,7 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
>>>
>>> update_fs_metadata(sbi, -secs);
>>> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
>>
>> Need a barrier here to keep order in between above code and set_sbi_flag(DIRTY)?
>
> I don't think a barrier will help here. Let us say there is a another context
> doing CP already, then it races with update_fs_metadata(), so it may or may not
> see the resize updates and it will also clear the SBI_IS_DIRTY flag set by resize
> (even with a barrier).

I agreed, actually, we didn't consider race condition in between CP and
update_fs_metadata(), it should be fixed.

>
> I think we need to synchronize this with CP context, so that these resize changes
> will be reflected properly. Please see the new diff below and help with the review.
>
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index a14a75f..5554af8 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1467,6 +1467,7 @@ static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs)
> long long user_block_count =
> le64_to_cpu(F2FS_CKPT(sbi)->user_block_count);
>
> + clear_sbi_flag(sbi, SBI_IS_DIRTY);

Why clear dirty flag here?

And why not use cp_mutex to protect update_fs_metadata() in error path of
f2fs_sync_fs() below?

> SM_I(sbi)->segment_count = (int)SM_I(sbi)->segment_count + segs;
> MAIN_SEGS(sbi) = (int)MAIN_SEGS(sbi) + segs;
> FREE_I(sbi)->free_sections = (int)FREE_I(sbi)->free_sections + secs;
> @@ -1575,9 +1576,12 @@ int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count)
> goto out;
> }
>
> + mutex_lock(&sbi->cp_mutex);
> update_fs_metadata(sbi, -secs);
> clear_sbi_flag(sbi, SBI_IS_RESIZEFS);
> set_sbi_flag(sbi, SBI_IS_DIRTY);
> + mutex_unlock(&sbi->cp_mutex);
> +
> err = f2fs_sync_fs(sbi->sb, 1);
> if (err) {
> update_fs_metadata(sbi, secs);

^^^^^^^^^^^^^^

In addition, I found that we missed to use sb_lock to protect f2fs_super_block
fields update, will submit a patch for that.

Thanks,

>
> thanks,
>
>>
>>> + set_sbi_flag(sbi, SBI_IS_DIRTY);
>>> err = f2fs_sync_fs(sbi->sb, 1);
>>> if (err) {
>>> update_fs_metadata(sbi, secs);
>>
>> Do we need to add clear_sbi_flag(, SBI_IS_DIRTY) into update_fs_metadata(), so above
>> path can be covered as well?
>>
>> Thanks,
>>
>>>
>