Re: [RFC][PATCH v2] ext4: add expose_stale_data flag in fallocate

From: Zheng Liu
Date: Sun May 06 2012 - 23:37:29 EST


On Thu, May 03, 2012 at 06:09:48AM -0400, Ric Wheeler wrote:
> On 05/03/2012 12:14 AM, Zheng Liu wrote:
> >Hi all,
> >
> >Here is the v2 of FALLOC_FL_NO_HIDE_STALE in fallocate. Now no new flag is
> >added into vfs in order to reduce the impacts and avoid a huge security hole.
> >The application cannot call fallocate with a new flag to create an unwritten
> >extent. It needs to call ioctl to enable/disable this feature. Meanwhile, in
> >ioctl, filesystem will check CAP_SYS_RAWIO to ensure that the user has a
> >privilege to switch on/off it. Currently, I only implement it in ext4.
>
> Hi Zheng,
>
> I thought that we had pretty much decided to try and fix the ext4
> performance impact, not pursue this?

Hi Ric,

Sorry for the delay reply. I fully agree with you about trying to improve the
ext4 performance. But maybe it is useful for someone who quite needs to
avoid this overhead and can afford this huge security hole. Just leave
this patch in the mailing list. :-)

Regards,
Zheng

>
> >
> >Even though I try to reduce its impact, this feature still brings a security
> >hole. So the application must ensure that it initializes an unwritten extent
> >by itself before reading it, and it is used in a limited environment.
> >
> >v1 -> v2:
> >* remove FALLOC_FL_NO_HIDE_STALE flag in vfs
> >* add 'i_expose_stale_data' in ext4 to enable/disable it
> >
> >Regards,
> >Zheng
> >
> > From 530045b4a1f75df5afd40c0e20c89917f97d7d5a Mon Sep 17 00:00:00 2001
> >From: Zheng Liu<wenqing.lz@xxxxxxxxxx>
> >Date: Thu, 3 May 2012 11:21:44 +0800
> >Subject: [RFC][PATCH v2] ext4: add expose_stale_data flag in fallocate
> >
> >We can use fallocate to preallocate sequential blocks. But these extents are
> >uninitialized. So when the application does a write, filesystem will
> >initialized these unwritten extents and it brings a huge overhead in some
> >cases.
> >
> >This patch adds a new flag in inode to indicate whether initialize an unwritten
> >extent or not. This flag is enable/disable within ioctl that switch on/off this
> >feature. The application must call ioctl to enable this feature before it tries
> >to preallocate some blocks.
> >
> >Obviously, this feature brings a huge security hole. The application must
> >guarantee to initialize this file by itself before reading it at the same
> >offset. So the application *MUST* use it carefully.
> >
> >CC: Ric Wheeler<rwheeler@xxxxxxxxxx>
> >CC: Eric Sandeen<sandeen@xxxxxxxxxx>
> >CC: Ted Ts'o tytso@xxxxxxx>
> >CC: Dave Chinner<david@xxxxxxxxxxxxx>
> >CC: Lukas Czerner<lczerner@xxxxxxxxxx>
> >CC: Andreas Dilger<adilger@xxxxxxxxxxxxx>
> >CC: Szabolcs Szakacsits<szaka@xxxxxxxxxx>
> >Signed-off-by: Zheng Liu<wenqing.lz@xxxxxxxxxx>
> >---
> > fs/ext4/ext4.h | 5 +++++
> > fs/ext4/extents.c | 6 +++++-
> > fs/ext4/ioctl.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> > fs/ext4/super.c | 1 +
> > 4 files changed, 54 insertions(+), 1 deletions(-)
> >
> >diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> >index 0e01e90..f56caf0 100644
> >--- a/fs/ext4/ext4.h
> >+++ b/fs/ext4/ext4.h
> >@@ -593,6 +593,8 @@ enum {
> > #define EXT4_IOC_ALLOC_DA_BLKS _IO('f', 12)
> > #define EXT4_IOC_MOVE_EXT _IOWR('f', 15, struct move_extent)
> > #define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
> >+#define EXT4_IOC_GET_EXPOSE_STALE _IOR('f', 17, int)
> >+#define EXT4_IOC_SET_EXPOSE_STALE _IOW('f', 18, int)
> >
> > #if defined(__KERNEL__)&& defined(CONFIG_COMPAT)
> > /*
> >@@ -908,6 +910,9 @@ struct ext4_inode_info {
> > */
> > tid_t i_sync_tid;
> > tid_t i_datasync_tid;
> >+
> >+ /* expose stale data in creating a new extent */
> >+ int i_expose_stale_data;
> > };
> >
> > /*
> >diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >index abcdeab..14f54f1 100644
> >--- a/fs/ext4/extents.c
> >+++ b/fs/ext4/extents.c
> >@@ -4269,6 +4269,7 @@ static void ext4_falloc_update_inode(struct inode *inode,
> > long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> > {
> > struct inode *inode = file->f_path.dentry->d_inode;
> >+ struct ext4_inode_info *ei = EXT4_I(inode);
> > handle_t *handle;
> > loff_t new_size;
> > unsigned int max_blocks;
> >@@ -4312,7 +4313,10 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> > trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
> > return ret;
> > }
> >- flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
> >+ if (ei->i_expose_stale_data)
> >+ flags = EXT4_GET_BLOCKS_CREATE;
> >+ else
> >+ flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
> > if (mode& FALLOC_FL_KEEP_SIZE)
> > flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
> > /*
> >diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> >index 6eee255..a37db8e 100644
> >--- a/fs/ext4/ioctl.c
> >+++ b/fs/ext4/ioctl.c
> >@@ -432,6 +432,47 @@ resizefs_out:
> > return 0;
> > }
> >
> >+ case EXT4_IOC_GET_EXPOSE_STALE: {
> >+ int enable;
> >+
> >+ /* security check */
> >+ if (!capable(CAP_SYS_RAWIO))
> >+ return -EPERM;
> >+
> >+ /*
> >+ * currently only extent-based files support (pre)allocate with
> >+ * EXPOSE_STALE_DATA flag
> >+ */
> >+ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
> >+ return -EOPNOTSUPP;
> >+
> >+ enable = ei->i_expose_stale_data;
> >+
> >+ return put_user(enable, (int __user *) arg);
> >+ }
> >+
> >+ case EXT4_IOC_SET_EXPOSE_STALE: {
> >+ int enable;
> >+
> >+ /* security check */
> >+ if (!capable(CAP_SYS_RAWIO))
> >+ return -EPERM;
> >+
> >+ /*
> >+ * currently only extent-based files support (pre)allocate with
> >+ * EXPOSE_STALE_DATA flag
> >+ */
> >+ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)))
> >+ return -EOPNOTSUPP;
> >+
> >+ if (get_user(enable, (int __user *) arg))
> >+ return -EFAULT;
> >+
> >+ ei->i_expose_stale_data = enable;
> >+
> >+ return 0;
> >+ }
> >+
> > default:
> > return -ENOTTY;
> > }
> >@@ -495,6 +536,8 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> > case EXT4_IOC_MOVE_EXT:
> > case FITRIM:
> > case EXT4_IOC_RESIZE_FS:
> >+ case EXT4_IOC_GET_EXPOSE_STALE:
> >+ case EXT4_IOC_SET_EXPOSE_STALE:
> > break;
> > default:
> > return -ENOIOCTLCMD;
> >diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >index e1fb1d5..6de2db0 100644
> >--- a/fs/ext4/super.c
> >+++ b/fs/ext4/super.c
> >@@ -943,6 +943,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> > ei->i_datasync_tid = 0;
> > atomic_set(&ei->i_ioend_count, 0);
> > atomic_set(&ei->i_aiodio_unwritten, 0);
> >+ ei->i_expose_stale_data = 0;
> >
> > return&ei->vfs_inode;
> > }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/