Re: [PATCH v4 3/3] fs: introduce getfsxattrat and setfsxattrat syscalls
From: Andrey Albershteyn
Date: Fri Apr 25 2025 - 14:17:23 EST
On 2025-04-22 17:14:10, Christian Brauner wrote:
> On Tue, Apr 22, 2025 at 04:31:29PM +0200, Christian Brauner wrote:
> > On Thu, Mar 27, 2025 at 12:39:28PM +0100, Amir Goldstein wrote:
> > > On Thu, Mar 27, 2025 at 10:33 AM Andrey Albershteyn <aalbersh@xxxxxxxxxx> wrote:
> > > >
> > > > On 2025-03-23 09:56:25, Amir Goldstein wrote:
> > > > > On Fri, Mar 21, 2025 at 8:49 PM Andrey Albershteyn <aalbersh@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > From: Andrey Albershteyn <aalbersh@xxxxxxxxxx>
> > > > > >
> > > > > > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > > > > > extended attributes/flags. The syscalls take parent directory fd and
> > > > > > path to the child together with struct fsxattr.
> > > > > >
> > > > > > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > > > > > that file don't need to be open as we can reference it with a path
> > > > > > instead of fd. By having this we can manipulated inode extended
> > > > > > attributes not only on regular files but also on special ones. This
> > > > > > is not possible with FS_IOC_FSSETXATTR ioctl as with special files
> > > > > > we can not call ioctl() directly on the filesystem inode using fd.
> > > > > >
> > > > > > This patch adds two new syscalls which allows userspace to get/set
> > > > > > extended inode attributes on special files by using parent directory
> > > > > > and a path - *at() like syscall.
> > > > > >
> > > > > > CC: linux-api@xxxxxxxxxxxxxxx
> > > > > > CC: linux-fsdevel@xxxxxxxxxxxxxxx
> > > > > > CC: linux-xfs@xxxxxxxxxxxxxxx
> > > > > > Signed-off-by: Andrey Albershteyn <aalbersh@xxxxxxxxxx>
> > > > > > Acked-by: Arnd Bergmann <arnd@xxxxxxxx>
> > > > > > ---
> > > > > ...
> > > > > > +SYSCALL_DEFINE5(setfsxattrat, int, dfd, const char __user *, filename,
> > > > > > + struct fsxattr __user *, ufsx, size_t, usize,
> > > > > > + unsigned int, at_flags)
> > > > > > +{
> > > > > > + struct fileattr fa;
> > > > > > + struct path filepath;
> > > > > > + int error;
> > > > > > + unsigned int lookup_flags = 0;
> > > > > > + struct filename *name;
> > > > > > + struct mnt_idmap *idmap;.
> > > > >
> > > > > > + struct dentry *dentry;
> > > > > > + struct vfsmount *mnt;
> > > > > > + struct fsxattr fsx = {};
> > > > > > +
> > > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) < FSXATTR_SIZE_VER0);
> > > > > > + BUILD_BUG_ON(sizeof(struct fsxattr) != FSXATTR_SIZE_LATEST);
> > > > > > +
> > > > > > + if ((at_flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + if (!(at_flags & AT_SYMLINK_NOFOLLOW))
> > > > > > + lookup_flags |= LOOKUP_FOLLOW;
> > > > > > +
> > > > > > + if (at_flags & AT_EMPTY_PATH)
> > > > > > + lookup_flags |= LOOKUP_EMPTY;
> > > > > > +
> > > > > > + if (usize > PAGE_SIZE)
> > > > > > + return -E2BIG;
> > > > > > +
> > > > > > + if (usize < FSXATTR_SIZE_VER0)
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + error = copy_struct_from_user(&fsx, sizeof(struct fsxattr), ufsx, usize);
> > > > > > + if (error)
> > > > > > + return error;
> > > > > > +
> > > > > > + fsxattr_to_fileattr(&fsx, &fa);
> > > > > > +
> > > > > > + name = getname_maybe_null(filename, at_flags);
> > > > > > + if (!name) {
> > > > > > + CLASS(fd, f)(dfd);
> > > > > > +
> > > > > > + if (fd_empty(f))
> > > > > > + return -EBADF;
> > > > > > +
> > > > > > + idmap = file_mnt_idmap(fd_file(f));
> > > > > > + dentry = file_dentry(fd_file(f));
> > > > > > + mnt = fd_file(f)->f_path.mnt;
> > > > > > + } else {
> > > > > > + error = filename_lookup(dfd, name, lookup_flags, &filepath,
> > > > > > + NULL);
> > > > > > + if (error)
> > > > > > + return error;
> > > > > > +
> > > > > > + idmap = mnt_idmap(filepath.mnt);
> > > > > > + dentry = filepath.dentry;
> > > > > > + mnt = filepath.mnt;
> > > > > > + }
> > > > > > +
> > > > > > + error = mnt_want_write(mnt);
> > > > > > + if (!error) {
> > > > > > + error = vfs_fileattr_set(idmap, dentry, &fa);
> > > > > > + if (error == -ENOIOCTLCMD)
> > > > > > + error = -EOPNOTSUPP;
> > > > >
> > > > > This is awkward.
> > > > > vfs_fileattr_set() should return -EOPNOTSUPP.
> > > > > ioctl_setflags() could maybe convert it to -ENOIOCTLCMD,
> > > > > but looking at similar cases ioctl_fiemap(), ioctl_fsfreeze() the
> > > > > ioctl returns -EOPNOTSUPP.
> > > > >
> > > > > I don't think it is necessarily a bad idea to start returning
> > > > > -EOPNOTSUPP instead of -ENOIOCTLCMD for the ioctl
> > > > > because that really reflects the fact that the ioctl is now implemented
> > > > > in vfs and not in the specific fs.
> > > > >
> > > > > and I think it would not be a bad idea at all to make that change
> > > > > together with the merge of the syscalls as a sort of hint to userspace
> > > > > that uses the ioctl, that the sycalls API exists.
> > > > >
> > > > > Thanks,
> > > > > Amir.
> > > > >
> > > >
> > > > Hmm, not sure what you're suggesting here. I see it as:
> > > > - get/setfsxattrat should return EOPNOTSUPP as it make more sense
> > > > than ENOIOCTLCMD
> > > > - ioctl_setflags returns ENOIOCTLCMD which also expected
> > > >
> > > > Don't really see a reason to change what vfs_fileattr_set() returns
> > > > and then copying this if() to other places or start returning
> > > > EOPNOTSUPP.
> > >
> > > ENOIOCTLCMD conceptually means that the ioctl command is unknown
> > > This is not the case since ->fileattr_[gs]et() became a vfs API
> >
> > vfs_fileattr_{g,s}et() should not return ENOIOCTLCMD. Change the return
> > code to EOPNOTSUPP and then make EOPNOTSUPP be translated to ENOTTY on
> > on overlayfs and to ENOIOCTLCMD in ecryptfs and in fs/ioctl.c. This way
> > we get a clean VFS api while retaining current behavior. Amir can do his
> > cleanup based on that.
>
> Also this get/set dance is not something new apis should do. It should
> be handled like setattr_prepare() or generic_fillattr() where the
> filesystem calls a VFS helper and that does all of this based on the
> current state of the inode instead of calling into the filesystem twice:
>
> int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry *dentry,
> struct fileattr *fa)
> {
> <snip>
> inode_lock(inode);
> err = vfs_fileattr_get(dentry, &old_ma);
> if (!err) {
> /* initialize missing bits from old_ma */
> if (fa->flags_valid) {
> <snip>
> err = fileattr_set_prepare(inode, &old_ma, fa);
> if (!err && !security_inode_setfsxattr(inode, fa))
> err = inode->i_op->fileattr_set(idmap, dentry, fa);
>
You mean something like this? (not all fs are done)
--