Re: [rfc] new stat*fs-like syscall?

From: Andy Lutomirski
Date: Thu Jun 24 2010 - 10:15:54 EST


Nick Piggin wrote:
This has come up a few times in the past, and I'd like to try to get
an agreement on it. statvfs(2) importantly contains f_flag (mount
flags), and is encouraged to use rather than statfs(2). The kernel
provides a statfs syscall only.

This means glibc has to provide f_flag support by parsing /proc/mounts
and stat(2)ing mount points. This is really slow, and /proc/mounts is
hard for the kernel to provide. It's actually the last scalability
bottleneck in the core vfs for dbench (samba) after my patches.

Not only that, but it's racy.

Other than types, other differences are:
- statvfs(2) has is f_frsize, which seems fairly useless.
- statvfs(2) has f_favail.
- statfs(2) f_bsize is optimal transfer block, statvfs(2) f_bsize is fs
block size. The latter could be useful for disk space algorithms.
Both can be ill defned.
- statvfs(2) lacks f_type.

Is there anything more we should add here? Samba wants a capabilities
field, with things like sparse files, quotas, compression, encryption,
case preserving/sensitive.

Any thoughts?

Something like fsid but actually specified to uniquely identify a superblock. (Currently, fsid seems to be set by the filesystem, and nothing in particular ensures that two different filesystems couldn't have collisions.) We could guarantee (or have a flag guaranteeing) that (fsid, st_inode) actually uniquely identifies an inode.

Similarly, something like fsid that uniquely identifies the vfsmount could be useful, although I don't know how easy that would be to provide for fstat?fs.

If we could expose the complete set of filesystem mount options so that mount(1) didn't have to look at /proc/self/mounts or /etc/mtab, then playing with chroots would be that much easier.

Should we expose superblock and vfsmount options separately? We have read-only bind mounts now, but the way they work is rather inscrutable, and if stat?fs could say "superblock is read-write but vfsmount is readonly" then people might be able to make more sense of what's going on.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/