Re: [PATCH] ext4: set csum seed in tmp inode while migrating to extents

From: Luís Henriques
Date: Tue Dec 14 2021 - 11:46:45 EST


On Tue, Dec 14, 2021 at 01:03:17PM +0100, Jan Kara wrote:
> On Mon 06-12-21 14:37:33, Luís Henriques wrote:
> > When migrating to extents, the temporary inode will have it's own checksum
> > seed. This means that, when swapping the inodes data, the inode checksums
> > will be incorrect.
> >
> > This can be fixed by recalculating the extents checksums again. Or simply
> > by copying the seed into the temporary inode.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=213357
> > Reported-by: Jeroen van Wolffelaar <jeroen@xxxxxxxxxxxxx>
> > Signed-off-by: Luís Henriques <lhenriques@xxxxxxx>
>
> Thanks for debugging this! Two comments below:

And thanks for the review!

> > diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
> > index 7e0b4f81c6c0..dd4ece38fc83 100644
> > --- a/fs/ext4/migrate.c
> > +++ b/fs/ext4/migrate.c
> > @@ -413,7 +413,7 @@ int ext4_ext_migrate(struct inode *inode)
> > handle_t *handle;
> > int retval = 0, i;
> > __le32 *i_data;
> > - struct ext4_inode_info *ei;
> > + struct ext4_inode_info *ei, *tmp_ei;
>
> Probably no need for the new tmp_ei variable when you use it only once...

Sure, I'll drop that new variable in v2.

> > @@ -503,6 +503,10 @@ int ext4_ext_migrate(struct inode *inode)
> > }
> >
> > ei = EXT4_I(inode);
> > + tmp_ei = EXT4_I(tmp_inode);
> > + /* Use the right seed for checksumming */
> > + tmp_ei->i_csum_seed = ei->i_csum_seed;
> > +
>
> I think this is subtly broken in another way: If we crash in the middle of
> migration, tmp_inode (and possibly attached extent tree blocks) will have
> wrong checksums (remember that i_csum_seed is computed from inode number)
> and so orphan cleanup will fail. On the other hand in that case the orphan
> cleanup will free blocks we have already managed to attach to the tmp_inode
> although they are still properly attached to the old 'inode'. So the
> recovery from a crash in the middle of the migration seems to be broken
> anyway. So I guess what you do is an improvement. But can you perhaps:
>
> 1) Move i_csum_seed initialization to a bit earlier in ext4_ext_migrate()
> just after we have got the tmp_inode from ext4_new_inode()? That way all
> inode writes will at least happen with the same csum.
>
> 2) Add a comment you are updating the csum seed so that metadata blocks get
> proper checksum for 'inode' and that recovery from a crash in the middle of
> migration is currently broken.

Obviously, I did not realize the recovery process was broken and I
appreciate you took the time to explain _how_ it is broken. I'll add a
new item to (the bottom of) my to-do list and maybe one of these days I
get to look into it.

I'll send out v2 shortly, implementing your suggestions.

Cheers,
--
Luís

>
> Thanks!
>
> Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR