Re: [PATCH] ext4: Fix dir corruption when ext4_dx_add_entry() fails

From: Jan Kara
Date: Fri Sep 09 2022 - 06:08:51 EST


On Fri 09-09-22 14:27:36, Zhihao Cheng wrote:
> Following process may lead to fs corruption:
> 1. ext4_create(dir/foo)
> ext4_add_nondir
> ext4_add_entry
> ext4_dx_add_entry
> a. add_dirent_to_buf
> ext4_mark_inode_dirty
> ext4_handle_dirty_metadata // dir inode bh is recorded into journal
> b. ext4_append // dx_get_count(entries) == dx_get_limit(entries)
> ext4_bread(EXT4_GET_BLOCKS_CREATE)
> ext4_getblk
> ext4_map_blocks
> ext4_ext_map_blocks
> ext4_mb_new_blocks
> dquot_alloc_block
> dquot_alloc_space_nodirty
> inode_add_bytes // update dir's i_blocks
> ext4_ext_insert_extent
> ext4_ext_dirty // record extent bh into journal
> ext4_handle_dirty_metadata(bh)
> // record new block into journal
> inode->i_size += inode->i_sb->s_blocksize // new size(in mem)
> c. ext4_handle_dirty_dx_node(bh2)
> // record dir's new block(dx_node) into journal
> d. ext4_handle_dirty_dx_node((frame - 1)->bh)
> e. ext4_handle_dirty_dx_node(frame->bh)
> f. do_split // ret err!
> g. add_dirent_to_buf
> ext4_mark_inode_dirty(dir) // update raw_inode on disk(skipped)
> 2. fsck -a /dev/sdb
> drop last block(dx_node) which beyonds dir's i_size.
> /dev/sdb: recovering journal
> /dev/sdb contains a file system with errors, check forced.
> /dev/sdb: Inode 12, end of extent exceeds allowed value
> (logical block 128, physical block 3938, len 1)
> 3. fsck -fn /dev/sdb
> dx_node->entry[i].blk > dir->i_size
> Pass 2: Checking directory structure
> Problem in HTREE directory inode 12 (/dir): bad block number 128.
> Clear HTree index? no
> Problem in HTREE directory inode 12: block #3 has invalid depth (2)
> Problem in HTREE directory inode 12: block #3 has bad max hash
> Problem in HTREE directory inode 12: block #3 not referenced
>
> Just like make_indexed_dir() does, update dir inode if error occurs.
> Fetch a reproducer in [Link].
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216466
> CC: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx>

Thanks for the analysis and the fix! In principle it looks fine but overall
we seem to be defering directory dirtying too much in ext4 directory
handling code and as a result things are too fragile as you've noticed. By
looking through fs/ext4/namei.c I've found several more places that have
exactly the same problem as you're fixing here. I think that specifically
for these problems with ext4_append() the best solution is to mark inode
dirty directly inside ext4_append(). Sure it will result in more copying of
inode data into the journal but growing directory size is not that
performance critical operation so I think the code simplicity is worth the
extra CPU cycles.

Honza

> ---
> fs/ext4/namei.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 3a31b662f661..f04871fa4ead 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -2617,6 +2617,13 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname,
> */
> if (restart && err == 0)
> goto again;
> + /*
> + * Even if the dx_add_entry failed, we have to properly write
> + * out all the changes we did so far. Otherwise we can end up
> + * with corrupted filesystem.
> + */
> + if (err)
> + ext4_mark_inode_dirty(handle, dir);
> return err;
> }
>
> --
> 2.31.1
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR