Re: [dm-devel] [PATCH] deadlock with suspend and quotas

From: Mikulas Patocka
Date: Wed Nov 30 2011 - 07:15:04 EST




On Wed, 30 Nov 2011, Jan Kara wrote:

> On Wed 30-11-11 01:52:22, Mikulas Patocka wrote:
> > > > dm-ioctl.h:
> > > > /*
> > > > * Set this to avoid attempting to freeze any filesystem when suspending.
> > > > */
> > > > #define DM_SKIP_LOCKFS_FLAG (1 << 10) /* In */
> > > Thanks. I was now checking in detail and indeed FIFREEZE fails if
> > > ->freeze_fs is not set. And only xfs, ext3, ext4, reiserfs, jfs, nilfs2,
> > > and gfs2 provide this function. So I was correct in assuming that when
> > > filesystem supports FIFREEZE it must make sure no modifications happen to
> > > the filesystem. So I believe that my original plan for sync to skip frozen
> > > filesystem is correct.
> > >
> > > Honza
> >
> > LVM doesn't suspend with FIFREEZE, it calls freeze_bdev directly from
> > drivers/md/dm.c (and it works for all filesystems, including ext2).
> Ah, I see. Sorry I missed this. But then I can only reiterate that
> drivers/md/dm.c is IMHO broken. Either it cares about filesystem being
> really frozen - and then it should refuse the operation for e.g. ext2
> because it cannot be frozen - or it does not care about filesystem being
> frozen and then there's no point in calling freeze_super(). Possibly, you
> might still want to e.g. try snapshotting even if freeze_super() would
> return EOPNOTSUPP but that should be handled inside dm, not by errorneously
> marking filesystem as frozen when it is not. Or am I still missing
> something?
>
> > So if you skip sync of frozen filesystems, you introduce a data
> > corruption if someone takes a snapshot of ext2.
> Yes, because ext2 cannot really be frozen, it is (errorneously) marked
> as such but it is not frozen...
>
> Honza

The semantics of freeze is like "do the best you can". You can't freeze
ext2 to a clean state (even if you managed to block all code paths that
create dirty data, it still can't be cleaned because you can't get rid of
inodes that are open and deleted).

On non-journaled filesystems, freeze_bdev does a sync of the filesystem
and prevents some code paths (such as __generic_file_aio_write) from
creating more dirty data with "vfs_check_frozen" --- that vfs_check_frozen
doesn't guarantee anything, but it reduces a probability of corruption.

So at the end, all data that were dirty before taking a snapshot are
guaranteed to be flushed to disk. And if someone writes to the ext2
filesystem while taking a snapshot, it may create inconsistencies that
must be fixed by running fsck on the snapshot.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/