RE: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid

From: Chao Yu
Date: Mon Aug 24 2015 - 05:39:32 EST


Hi Jaegeuk,

> -----Original Message-----
> From: Chao Yu [mailto:yuchaochina@xxxxxxxxxxx]
> Sent: Friday, August 21, 2015 11:00 PM
> To: Chao Yu
> Cc: Jaegeuk Kim; linux-fsdevel@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid
>
> > On Aug 21, 2015, at 8:48 PM, Chao Yu <chao2.yu@xxxxxxxxxxx> wrote:
> >
> > Hi Jaegeuk,
> >
> >> -----Original Message-----
> >> From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx]
> >> Sent: Thursday, August 20, 2015 11:35 PM
> >> To: Chao Yu
> >> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx;
> >> linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
> >> Subject: Re: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated
> nid
> >>
> >> On Thu, Aug 20, 2015 at 05:12:03PM +0800, Chao Yu wrote:
> >>> Hi Jaegeuk,
> >>>
> >>>> -----Original Message-----
> >>>> From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx]
> >>>> Sent: Tuesday, August 18, 2015 4:46 PM
> >>>> To: linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx;
> >>>> linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx
> >>>> Cc: Jaegeuk Kim
> >>>> Subject: [f2fs-dev] [PATCH 5/5] f2fs: check the node block address of newly allocated nid
> >>>>
> >>>> This patch adds a routine which checks the block address of newly allocated nid.
> >>>> If an nid has already allocated by other thread due to subtle data races, it
> >>>> will result in filesystem corruption.
> >>>> So, it needs to check whether its block address was already allocated or not
> >>>> in prior to nid allocation as the last chance.
> >>>>
> >>>> Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> >>>> ---
> >>>> fs/f2fs/node.c | 9 +++++++++
> >>>> 1 file changed, 9 insertions(+)
> >>>>
> >>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> >>>> index 3cc32b8..6bef5a2 100644
> >>>> --- a/fs/f2fs/node.c
> >>>> +++ b/fs/f2fs/node.c
> >>>> @@ -1573,6 +1573,8 @@ retry:
> >>>>
> >>>> /* We should not use stale free nids created by build_free_nids */
> >>>> if (nm_i->fcnt && !on_build_free_nids(nm_i)) {
> >>>> + struct node_info ni;
> >>>> +
> >>>> f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list));
> >>>> list_for_each_entry(i, &nm_i->free_nid_list, list)
> >>>> if (i->state == NID_NEW)
> >>>> @@ -1583,6 +1585,13 @@ retry:
> >>>> i->state = NID_ALLOC;
> >>>> nm_i->fcnt--;
> >>>> spin_unlock(&nm_i->free_nid_list_lock);
> >>>> +
> >>>> + /* check nid is allocated already */
> >>>> + get_node_info(sbi, *nid, &ni);
> >>>> + if (ni.blk_addr != NULL_ADDR) {
> >>>
> >>> I didn't get it, why free nid is with non-NULL blkaddr?
> >>> Could you please explain more about this?
> >>
> >> As I wrote in the description, I've been suffering from wrongly added free nids
> >> which results in fs corruption. I suspect somewhat race condition in
> >> build_free_nids, but it is very subtle to figure out exactly.
> >> So, I wrote this patch to fix that.
> >>
> >> The concern would be performance regarding to cold cache miss at an NAT entry.
> >> However, I expect that it would be tolerable since get_node_info will be called
> >> after alloc_nid later.
> >
> > After investigating, I think I can reproduce this bug:
> >
> > 1. touch a (nid = 4) & touch b (nid = 5)
> > 2. sync
> > 3. rm a & rm b
> > a) rm a to make next_scan_nid = 4.
> > b) I change the logical of f2fs code making remove_inode_page failed when
> > file b is being removed, so file b's nat entry is not set dirty;
> > 4. sync
> > 5. touch 1815 files
> > 6. echo 3 > /proc/sys/vm/drop_caches
> > drop clean nat entry of inode (nid:5), it makes we can pass blkaddr
> > verification in add_free_nid:
> > if (build) {
> > /* do not add allocated nids */
> >
> > 7. touch c
> > because there is no free nids in cache, we try to build cache by two steps:
> > a) build nids by loading from nat pages;
> > b) build nids by loading from curseg and try to unload nids which has valid
> > blkaddr in curseg.
> >
> > unfortunately, our build operation is not atomic, so after step a), nid:5
>
> After rethinking about this issue on the way coming back home, I find that
> it seems not right here, because we will try to check build_lock status in
> on_build_free_nids, allocation will not happen during building free nid
> cache. I missed that previously.
>
> Sorry for my wrong conclusion, please ignore them. :(
>
> Iâd like to reinvestigate this issue.

I reinvestigate this issue and find one possible call path for reproducing this
issue, and I wrote patches for fxing, can you please help to review the following
patches?

Thanks,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/