Re: [PATCH 2/2] fsnotify: allow sleepable child flag update

From: Amir Goldstein
Date: Tue Oct 18 2022 - 01:37:01 EST


On Tue, Oct 18, 2022 at 7:12 AM Stephen Brennan
<stephen.s.brennan@xxxxxxxxxx> wrote:
>
> With very large d_subdirs lists, iteration can take a long time. Since
> iteration needs to hold parent->d_lock, this can trigger soft lockups.
> It would be best to make this iteration sleepable. Since we have the
> inode locked exclusive, we can drop the parent->d_lock and sleep,
> holding a reference to a child dentry, and continue iteration once we
> wake.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx>
> ---
> fs/notify/fsnotify.c | 72 ++++++++++++++++++++++++++++++--------------
> 1 file changed, 50 insertions(+), 22 deletions(-)
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index e887a195983b..499b19272b32 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -102,10 +102,13 @@ void fsnotify_sb_delete(struct super_block *sb)
> * on a child we run all of our children and set a dentry flag saying that the
> * parent cares. Thus when an event happens on a child it can quickly tell
> * if there is a need to find a parent and send the event to the parent.
> + *
> + * Context: inode locked exclusive

Please add code assertion

WARN_ON_ONCE(!inode_is_locked(inode));

and it probably wouldn't hurt to add an inline wrapper
fsnotify_update_child_dentry_flags()
that locks the inode and calls this helper.

> */
> void __fsnotify_update_child_dentry_flags(struct inode *inode)
> {
> - struct dentry *alias;
> + struct dentry *child, *alias, *last_ref = NULL;
> + struct list_head *p;
> int watched;
>
> if (!S_ISDIR(inode->i_mode))
> @@ -114,30 +117,55 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode)
> /* determine if the children should tell inode about their events */
> watched = fsnotify_inode_watches_children(inode);
>
> - spin_lock(&inode->i_lock);
> - /* run all of the dentries associated with this inode. Since this is a
> - * directory, there damn well better only be one item on this list */
> - hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
> - struct dentry *child;
> -
> - /* run all of the children of the original inode and fix their
> - * d_flags to indicate parental interest (their parent is the
> - * original inode) */
> - spin_lock(&alias->d_lock);
> - list_for_each_entry(child, &alias->d_subdirs, d_child) {
> - if (!child->d_inode)
> - continue;
> + alias = d_find_any_alias(inode);

Please make the alias change in a separate patch.
It is not explained in commit message and it clutters
the diff which makes reviewing the actual logic changes
harder.

> +
> + /*
> + * These lists can get very long, so we may need to sleep during
> + * iteration. Normally this would be impossible without a cursor,
> + * but since we have the inode locked exclusive, we're guaranteed
> + * that the directory won't be modified, so whichever dentry we
> + * pick to sleep on won't get moved. So, start a manual iteration
> + * over d_subdirs which will allow us to sleep.
> + */
> + spin_lock(&alias->d_lock);
> + p = alias->d_subdirs.next;
>
> - spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
> - if (watched)
> - child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
> - else
> - child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
> - spin_unlock(&child->d_lock);
> + while (p != &alias->d_subdirs) {
> + child = list_entry(p, struct dentry, d_child);

IMO it would be better to use list iterator helpers.
What was wrong with list_for_each_entry()?
Why did you feel that you need to open code it?

> + if (need_resched()) {
> + /*
> + * We need to hold a reference while we sleep. But when
> + * we wake, dput() could free the dentry, invalidating
> + * the list pointers. We can't look at the list pointers
> + * until we re-lock the parent, and we can't dput() once
> + * we have the parent locked. So the solution is to hold
> + * onto our reference and free it the *next* time we drop
> + * alias->d_lock: either at the end of the function, or
> + * at the time of the next sleep.
> + */
> + dget(child);
> + spin_unlock(&alias->d_lock);
> + if (last_ref)
> + dput(last_ref);
> + last_ref = child;
> + cond_resched();
> + spin_lock(&alias->d_lock);
> }
> - spin_unlock(&alias->d_lock);
> + p = p->next;
> +
> + if (!child->d_inode)
> + continue;
> +
> + spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
> + if (watched)
> + child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
> + else
> + child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
> + spin_unlock(&child->d_lock);
> }
> - spin_unlock(&inode->i_lock);
> + spin_unlock(&alias->d_lock);
> + if (last_ref)
> + dput(last_ref);

Nit: if not needed. dput(NULL) works just fine.

Thanks,
Amir.