Re: [RFC 1/2] fanotify: new event FAN_MODIFY_DIR

From: Amir Goldstein
Date: Tue Mar 14 2017 - 06:11:48 EST


On Tue, Mar 14, 2017 at 1:02 AM, Filip ÅtÄdronskà <r.lklm@xxxxxxxxxx> wrote:
> Fanotify currently does not report directory modification events
> (rename, unlink, etc.). But these events are essential for about half of
> concievable fanotify use cases, especially:
>
> - file system indexers / desktop search tools
> - file synchronization tools (like Dropbox, Nextcloud, etc.),
> online backup tools

This last one is the use case of my employer, Ctera Networks.
Out of curiosity, what is the use case that you are focusing on?

>
> and pretty much any app that needs to maintain and up-to-date internal
> representation of current contents of the file system.
>
> All applications of the above classes that I'm aware of currently use
> recursive inotify watches, which do not scale (my home dir has ~200k
> directories, creating all the watches takes ~2min and eats several tens
> of MB of kernel memory).
>
> There have been many calls for such a feature, pretty much since the
> creation of fanotify in 2009:
> * By GNOME developers:
> https://wiki.gnome.org/BastienNocera/KernelWishlist#VFS.2C_filesystems
> * By KDE developers:
> http://lkml.kernel.org/r/201211011352.42476.Martin%40lichtvoll.de
> 'Better support for (desktop) file search / indexing applications'
> * And others:
> http://lkml.kernel.org/r/AANLkTi=owK=WZW4oNtpm5WpAZhqCQUdTR2K5gzJ_MqZ+%40mail.gmail.com
> 'Fanotify mv/rename?'
> http://lkml.kernel.org/r/1238158043.23703.20.camel%40fatty
> 'Issues with using fanotify for a filesystem indexer'
>

Thanks for sharing this summary!
I had the feeling that all recursive inotify users are hiding in the shadows,
but was missing more concrete evidence.

> Add a new fanotify event, FAN_MODIFY_DIR, that is emitted whenever the
> contents of a directory change (a directory entry is added, removed or
> renamed). This covers all the currently missing events: rename, unlink,
> mknod, mkdir, rmdir, etc.
>
> Included with the event is a file descriptor to the modified directory
> but no further info. The userspace application is expected to rescan the
> whole directory and update its model accordingly. This needs to be done
> carefully to prevent race conditions. A cross-directory rename generates
> two FAN_MODIFY_DIR events.
>

Your approach is interesting and I am glad you shared it with us.
I do like it and it gives me an idea, I am going to prepare a branch
with a subset
of my patches, so you can try them with your userspace sample program.

In comments to your patches I am going to argue that, as a matter of fact,
you can take a small sub set of my patches and get the same functionality
of FAN_MODIFY_DIR, with code that is *less* complex then what you propose.
So please treat my comments as technical comments how FAN_MODIFY_DIR
should be implemented IMO and not as an attempt to reject an opposing proposal.

> This minimalistic approach has several advantages:
> * Does not require changes to the fanotify userspace API or the
> fsnotify in-kernel framework, apart from adding a new event.

About the argument of not having to change in-kernel framework,
I don't think it should be a consideration at all.
I claim that my 1st fsnotify cleanup series is an improvement of the framework
(https://github.com/amir73il/linux/commits/fsnotify_dentry)
regardless of additional functionality.
So changing the in-kernel framework by adding complexity may be bad,
but changing it by simplifying and making code more readable should not
be an argument against the "non-minimalistic" approach.

About the change to usespace API, if you strip off the *added* functionality
of super block watch from my patches, your proposal for user API looks
like this:

fanotify_mark(fan_fd, FAN_MARK_ADD|FAN_MARK_MOUNT, \
FAN_MODIFY_DIR|FAN_ONDIR,

And my proposal for user API looks like this:

fanotify_mark(fan_fd, FAN_MARK_ADD|FAN_MARK_MOUNT, \
FAN_MOVE|FAN_CREATE|FAN_DELETE,

You can see why my proposal for user API is not any less minimalistic
and it is fully compatible with existing intotify API for the same events.

I understand why it is hard to see this behind my added functionality,
so I will post a minimalistic branch and test program as an example.

> Especially doesn't complicate it by adding string fields.

So the string fields in my proposal are optional.
userspace opts-in to get them by specifying the FAN_EVENT_INFO_NAME flag:

fanotify_init(FAN_CLOEXEC | FAN_CLASS_NOTIF |
FAN_EVENT_INFO_PARENT | FAN_EVENT_INFO_NAME,

If you don't specify FAN_EVENT_INFO_NAME, you can get filename events
FAN_MOVE|FAN_CREATE|FAN_DELETE without the name.

What you do get is the file descriptor of the parent. sounds familiar? ;-)

The flag FAN_EVENT_INFO_PARENT in an explicit opt-in to get parent fd
instead of victim id on specific events.
If FAN_EVENT_INFO_PARENT is not specified in fanotify_init()
the filename events FAN_MOVE|FAN_CREATE|FAN_DELETE are masked
out of fanotify_mark().

This is important because your patchs adds FAN_MODIFY_DIR to
FAN_ALL_EVENTS (as does mine).
So old fanotify userspace code, compiled with new kernel headers, with:

fanotify_mark(fan_fd, FAN_MARK_ADD|FAN_MARK_MOUNT,
FAN_ALL_EVENTS,

Will start getting your FAN_MODIFY_DIR events and those programs
might have wrong assumptions about the provided fd.

So I chose to have stronger backward compatibility, and added the opt-in
FAN_EVENT_INFO_PARENT flag.

> * Has simple and clear semantics, even with multiple renames occurring
> in parallel etc. In case of any inconsistencies, one can simply wait
> for a while and rescan again.
> * FAN_MODIFY_DIR events are easily merged in case of multiple
> operations on a directory (e.g. deleting all files).
>

I agree those 2 are important advantages.
You could get them with my proposed API when dropping the
FAN_EVENT_INFO_NAME flag.

Thanks for pointing that out.

> Signed-off-by: Filip ÅtÄdronskà <r.lkml@xxxxxxxxxx>
>
> ---
>
> An alternative might be to create wrapper functions like
> vfs_path_(rename|unlink|...). They could also take care of calling
> security_path_(rename|unlink|...), which is currently also up to
> the indvidual callers (possibly with a flag because it might not
> be always desired).
>
> An alternative was proposed by Amir Goldstein in several long series of
> patches that add superblock-scoped (as opposed to vfsmount-scoped)
> fanotify watches and specific dentry manipulation events with filenames:
>
> http://lkml.kernel.org/r/1481984434-13293-1-git-send-email-amir73il%40gmail.com
> http://lkml.kernel.org/r/1482247207-4424-1-git-send-email-amir73il%40gmail.com
> http://lkml.kernel.org/r/1476126784-12520-1-git-send-email-amir73il%40gmail.com
> http://lkml.kernel.org/r/1489411223-12081-1-git-send-email-amir73il%40gmail.com
>
> There is large but not complete overlap between that proposal and
> mine (which is was originally created over a year ago, before Amir's
> work, but never posted).
>

I am proud of the role I played to help your proposal see the day of light :-)

> I think the superblock watch idea is very interesting because it might
> in the future allow reporing fsnotify events from remote and virtual
> filesystems. So I'm posting this more as a potential source of more
> ideas for discussion, or a fallback proposal in case Amir's patches
> don't make it.
> ---
> fs/notify/fanotify/fanotify.c | 1 +
> include/linux/fsnotify.h | 17 +++++++++++++++++
> include/linux/fsnotify_backend.h | 1 +
> include/uapi/linux/fanotify.h | 5 ++++-
> 4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index bbc175d4213d..5178b06c338c 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -186,6 +186,7 @@ static int fanotify_handle_event(struct fsnotify_group *group,
>
> BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
> BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
> + BUILD_BUG_ON(FAN_MODIFY_DIR != FS_MODIFY_DIR);
> BUILD_BUG_ON(FAN_CLOSE_NOWRITE != FS_CLOSE_NOWRITE);
> BUILD_BUG_ON(FAN_CLOSE_WRITE != FS_CLOSE_WRITE);
> BUILD_BUG_ON(FAN_OPEN != FS_OPEN);
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index b43d3f5bd9ea..00fb87c975d6 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -208,6 +208,23 @@ static inline void fsnotify_modify(struct file *file)
> }
>
> /*
> + * fsnotify_modifydir - directory contents were changed
> + * (as a result of rename, creat, unlink, etc.)
> + */
> +static inline void fsnotify_modify_dir(struct path *path)
> +{
> + struct inode *inode = path->dentry->d_inode;
> + __u32 mask = FS_MODIFY_DIR;
> +
> + if (S_ISDIR(inode->i_mode))
> + mask |= FS_ISDIR;

It is going to be somewhat confusing for users to understand
the difference between FS_MODIFY_DIR and FS_MODIFY_DIR|FS_ISDIR.
IMO, it is much more intuitive (and compatible with intoify semantics)
when users
get specific event information, e.g. FS_DELETE and FS_DELETE|FS_ISDIR.


> + else
> + return;
> +
> + fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
> +}
> +
> +/*
> * fsnotify_open - file was opened
> */
> static inline void fsnotify_open(struct file *file)
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 487246546ebe..7751b337ec31 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -42,6 +42,7 @@
>
> #define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */
> #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */
> +#define FS_MODIFY_DIR 0x00040000 /* directory changed (rename/unlink/...) */
>
> #define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */
> #define FS_ISDIR 0x40000000 /* event occurred against dir */
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index 030508d195d3..f14e048d492a 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -15,6 +15,8 @@
> #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
> #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
>
> +#define FAN_MODIFY_DIR 0x00040000 /* directory changed (rename/unlink/...) */
> +
> #define FAN_ONDIR 0x40000000 /* event occurred against dir */
>
> #define FAN_EVENT_ON_CHILD 0x08000000 /* interested in child events */
> @@ -67,7 +69,8 @@
> #define FAN_ALL_EVENTS (FAN_ACCESS |\
> FAN_MODIFY |\
> FAN_CLOSE |\
> - FAN_OPEN)
> + FAN_OPEN |\
> + FAN_MODIFY_DIR)
>
> /*
> * All events which require a permission response from userspace
> --
> 2.11.1
>