Re: [RESEND PATCH v7 1/8] kernfs: Introduce interface to access global kernfs_open_file_mutex.

From: Imran Khan
Date: Wed Apr 06 2022 - 09:34:16 EST


Hello Al,

On 6/4/22 12:24 am, Al Viro wrote:
[...]
>
> What for? Again, have kernfs_drain_open_files() do this:
> {
> struct kernfs_open_node *on;
> struct kernfs_open_file *of;
>
> if (!(kn->flags & (KERNFS_HAS_MMAP | KERNFS_HAS_RELEASE)))
> return;
> if (rcu_dereference(kn->attr.open) == NULL)
> return;
> mutex_lock(&kernfs_open_file_mutex);
> // now ->attr.open is stable (all stores are under kernfs_open_file_mutex)
> on = rcu_dereference(kn->attr.open);
> if (!on) {
> mutex_unlock(&kernfs_open_file_mutex);
> return;
> }
> // on->files contents is stable
> list_for_each_entry(of, &on->files, list) {
> struct inode *inode = file_inode(of->file);
>
> if (kn->flags & KERNFS_HAS_MMAP)
> unmap_mapping_range(inode->i_mapping, 0, 0, 1);
>
> if (kn->flags & KERNFS_HAS_RELEASE)
> kernfs_release_file(kn, of);
> }
> mutex_unlock(&kernfs_open_file_mutex);
> }
>

I did something similar in in [1], except that I was traversing
on->files under rcu_read_lock and this was a source of confusion.

> What's the problem? The caller has already guaranteed that no additions will
> happen. Once we'd grabbed kernfs_open_file_mutex, we know that
> * kn->attr.open value won't change until we drop the mutex
> * nothing gets removed from kn->attr.open->files until we drop the mutex
> so we can bloody well walk that list, blocking as much as we want.
>
> We don't need rcu_read_lock() there - we are already holding the mutex used
> by writers for exclusion among themselves. RCU *allows* lockless readers,
> it doesn't require all readers to be such. kernfs_notify() can be made
> lockless, this one can't and that's fine.
>

Thanks for explaining this. I missed the exclusiveness being provided by
kernfs_open_file_mutex in this case.

> BTW, speaking of kernfs_notify() - can calls of that come from NMI handlers?
> If not, I'd consider using llist for kernfs_notify_list...

I see it gets invoked from 3 places only: cgroup_file_notify,
sysfs_notify and sysfs_notify_dirent. So kernfs_notify should not be
getting invoked in NMI context. I will make the llist transition in next
version.

Thanks,
-- Imran

[1]
https://lore.kernel.org/lkml/20220324103040.584491-3-imran.f.khan@xxxxxxxxxx/