Re: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid

From: Chao Yu
Date: Fri Aug 21 2015 - 11:07:15 EST

Next message: Arnaldo Carvalho de Melo: "Re: [PATCH V8 10/25] perf tools: Add example call-graph script"
Previous message: Dan Williams: "Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory""
In reply to: Chao Yu: "RE: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid"
Next in thread: Jaegeuk Kim: "[PATCH 3/5] f2fs: retry gc if one section is not successfully reclaimed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Aug 21, 2015, at 8:48 PM, Chao Yu <chao2.yu@xxxxxxxxxxx> wrote:
>
> Hi Jaegeuk,
>
>> -----Original Message-----
>> From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx]
>> Sent: Thursday, August 20, 2015 11:35 PM
>> To: Chao Yu
>> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx;
>> linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
>> Subject: Re: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid
>>
>> On Thu, Aug 20, 2015 at 05:12:03PM +0800, Chao Yu wrote:
>>> Hi Jaegeuk,
>>>
>>>> -----Original Message-----
>>>> From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx]
>>>> Sent: Tuesday, August 18, 2015 4:46 PM
>>>> To: linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx;
>>>> linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
>>>> Cc: Jaegeuk Kim
>>>> Subject: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid
>>>>
>>>> This patch adds a routine which checks the block address of newly allocated nid.
>>>> If an nid has already allocated by other thread due to subtle data races, it
>>>> will result in filesystem corruption.
>>>> So, it needs to check whether its block address was already allocated or not
>>>> in prior to nid allocation as the last chance.
>>>>
>>>> Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
>>>> ---
>>>> fs/f2fs/node.c | 9 +++++++++
>>>> 1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>> index 3cc32b8..6bef5a2 100644
>>>> --- a/fs/f2fs/node.c
>>>> +++ b/fs/f2fs/node.c
>>>> @@ -1573,6 +1573,8 @@ retry:
>>>>
>>>> /* We should not use stale free nids created by build_free_nids */
>>>> if (nm_i->fcnt && !on_build_free_nids(nm_i)) {
>>>> + struct node_info ni;
>>>> +
>>>> f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list));
>>>> list_for_each_entry(i, &nm_i->free_nid_list, list)
>>>> if (i->state == NID_NEW)
>>>> @@ -1583,6 +1585,13 @@ retry:
>>>> i->state = NID_ALLOC;
>>>> nm_i->fcnt--;
>>>> spin_unlock(&nm_i->free_nid_list_lock);
>>>> +
>>>> + /* check nid is allocated already */
>>>> + get_node_info(sbi, *nid, &ni);
>>>> + if (ni.blk_addr != NULL_ADDR) {
>>>
>>> I didn't get it, why free nid is with non-NULL blkaddr?
>>> Could you please explain more about this?
>>
>> As I wrote in the description, I've been suffering from wrongly added free nids
>> which results in fs corruption. I suspect somewhat race condition in
>> build_free_nids, but it is very subtle to figure out exactly.
>> So, I wrote this patch to fix that.
>>
>> The concern would be performance regarding to cold cache miss at an NAT entry.
>> However, I expect that it would be tolerable since get_node_info will be called
>> after alloc_nid later.
>
> After investigating, I think I can reproduce this bug:
>
> 1. touch a (nid = 4) & touch b (nid = 5)
> 2. sync
> 3. rm a & rm b
> a) rm a to make next_scan_nid = 4.
> b) I change the logical of f2fs code making remove_inode_page failed when
> file b is being removed, so file b's nat entry is not set dirty;
> 4. sync
> 5. touch 1815 files
> 6. echo 3 > /proc/sys/vm/drop_caches
> drop clean nat entry of inode (nid:5), it makes we can pass blkaddr
> verification in add_free_nid:
> if (build) {
> /* do not add allocated nids */
>
> 7. touch c
> because there is no free nids in cache, we try to build cache by two steps:
> a) build nids by loading from nat pages;
> b) build nids by loading from curseg and try to unload nids which has valid
> blkaddr in curseg.
>
> unfortunately, our build operation is not atomic, so after step a), nid:5

After rethinking about this issue on the way coming back home, I find that
it seems not right here, because we will try to check build_lock status in
on_build_free_nids, allocation will not happen during building free nid
cache. I missed that previously.

Sorry for my wrong conclusion, please ignore them. :(

Iâd like to reinvestigate this issue.

Thanks,

> should be in free nids cache and it should be removed in step b). So all
> free nids allocated between step a) and step b) can be risky of incorrect
> allocation.
>
> If I'm not miss something, the root casue looks like our recent change:
> allocate free nid aggressively.
>
> Thanks,
>>
>>>
>>>> + alloc_nid_done(sbi, *nid);
>>>
>>> Will another thread call alloc_nid_done too, making this free nid being
>>> released again?
>>
>> No, its state became NID_ALLOC, so no other thread can pick this up till
>> alloc_nid_done is called.
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>> + goto retry;
>>>> + }
>>>> return true;
>>>> }
>>>> spin_unlock(&nm_i->free_nid_list_lock);
>>>> --
>>>> 2.1.1
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> _______________________________________________
>>>> Linux-f2fs-devel mailing list
>>>> Linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
>>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Arnaldo Carvalho de Melo: "Re: [PATCH V8 10/25] perf tools: Add example call-graph script"
Previous message: Dan Williams: "Re: [RFC PATCH 1/7] x86, mm: ZONE_DEVICE for "device memory""
In reply to: Chao Yu: "RE: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid"
Next in thread: Jaegeuk Kim: "[PATCH 3/5] f2fs: retry gc if one section is not successfully reclaimed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]