Re: [PATCH v3] introduce sys_syncfs to sync a single file system

From: Indan Zupancic
Date: Fri Mar 11 2011 - 06:01:17 EST


Hello,

On Thu, March 10, 2011 20:31, Sage Weil wrote:
> It is frequently useful to sync a single file system, instead of all
> mounted file systems via sync(2):
>
> - On machines with many mounts, it is not at all uncommon for some of
> them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on
> those and may never get to the one you do care about (e.g., /).
> - Some applications write lots of data to the file system and then
> want to make sure it is flushed to disk. Calling fsync(2) on each
> file introduces unnecessary ordering constraints that result in a large
> amount of sub-optimal writeback/flush/commit behavior by the file
> system.
>
> There are currently two ways (that I know of) to sync a single super_block:
>
> - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
> mapping, which isn't usually desirable, and doesn't work for non-block
> file systems.
> - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
> current implemention. Relying on this little-known side effect for
> something like data safety sounds foolish.
>
> Both of these approaches require root privileges, which some applications
> do not have (nor should they need?) given that sync(2) is an unprivileged
> operation.
>
> This patch introduces a new system call syncfs(2) that takes an fd and
> syncs only the file system it references. Maybe someday we can
>
> $ sync /some/path
>
> and not get
>
> sync: ignoring all arguments
>
> The syscall is motivated by comments by Al and Christoph at the last LSF.
> syncfs(2) seems like an appropriate name given statfs(2).
>
> A similar ioctl was also proposed a while back, see
> http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2

The patch there seems much more reasonable than introducing a whole
new systemcall just for 20 lines of kernel code. New system calls are
added too easily nowadays.

As an alternative to the ioctl, I propose extending sync_file_range()
instead. E.g. add a SYNC_FILE_MOUNT flag and use that, either on any
fd on the mount or the root dir fd. That syscall is non-standard and
close enough that it can implement this behaviour too.

Greetings,

Indan

---

Something like:

diff --git a/fs/sync.c b/fs/sync.c
index ba76b96..9fa073c 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -18,7 +18,7 @@
#include "internal.h"

#define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
- SYNC_FILE_RANGE_WAIT_AFTER)
+ SYNC_FILE_RANGE_WAIT_AFTER|SYNC_FILE_MOUNT)

/*
* Do the filesystem syncing work. For simple filesystems
@@ -330,6 +330,15 @@ SYSCALL_DEFINE(sync_file_range)(int fd, loff_t offset, loff_t nbytes,
}

ret = 0;
+ if (flags & SYNC_FILE_MOUNT) {
+ struct super_block *sb;
+
+ sb = file->f_dentry->d_sb;
+ down_read(&sb->s_umount);
+ ret = sync_filesystem(sb);
+ up_read(&sb->s_umount);
+ goto out_put;
+ }
if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
ret = filemap_fdatawait_range(mapping, offset, endbyte);
if (ret < 0)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e38b50a..53e427e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -373,6 +373,7 @@ struct inodes_stat_t {
#define SYNC_FILE_RANGE_WAIT_BEFORE 1
#define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4
+#define SYNC_FILE_MOUNT 8

#ifdef __KERNEL__



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/