Re: [PATCH RFC] proc: Fix a dentry lock race between release_task and lookup

From: Zhihao Cheng
Date: Tue May 31 2022 - 21:27:20 EST


Hi,
If I understand correctly you are saying that under some circumstances
this code runs slow, and you are proposing an optimization.


That optimization is to change the content of the pid->inodes list
from all directories under that pid, to just the /proc/<tgid> and
/proc/<tgid>/task/<pid>.


Yes and yes.

The justification being that d_invalidate on the parent directory will
invalidate all children. So only those two directories are interesting
from a d_invalidate point of view.

Absolutely right.
That seems like a valid optimization.

This could also count as a regression fix if you can show how the
performance changed poorly when the pid->inodes change was introduced
and how the performance improves with your change. I currently only
see that you hit a pathological case and you are correcting it.

There is a reproducer attached in commit msg:
https://bugzilla.kernel.org/show_bug.cgi?id=216054

Create 200 tasks, each task open one file for 50,000 times. Kill all tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).
Before fix:
$ time killall -wq aa
real 4m40.946s # During this period, we can see 'ps' and 'systemd' taking high cpu usage.

After fix:
$ time killall -wq aa
real 1min~ # During this period, we can see 'systemd' taking high cpu usage.


As for the actual code change I think it would be better to
remove the code from proc_pid_make_inode and make a helper
proc_pid_make_base_inode that performs the extra work of
adding to the pid->list. Not adding a flag makes the code
easier to follow.

Agree, will send a v2.
Something like the code below.

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d654ce7150fd..9d025e70ddc3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1915,11 +1915,6 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
/* Let the pid remember us for quick removal */
ei->pid = pid;
- if (S_ISDIR(mode)) {
- spin_lock(&pid->lock);
- hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
- spin_unlock(&pid->lock);
- }
task_dump_owner(task, 0, &inode->i_uid, &inode->i_gid);
security_task_to_inode(task, inode);
@@ -1932,6 +1927,27 @@ struct inode *proc_pid_make_inode(struct super_block * sb,
return NULL;
}
+struct inode *proc_pid_make_base_inode(struct super_block * sb,
+ struct task_struct *task, umode_t mode)
+{
+ struct inode * inode;
+ struct proc_inode *ei;
+ struct pid *pid;
+
+ inode = proc_pid_make_inode(sb, task, mode);
+ if (!inode)
+ return NULL;
+
+ /* Let proc_flush_pid find this directory inode */
+ ei = PROC_I(inode);
+ pid = ei->pid;
+ spin_lock(&pid->lock);
+ hlist_add_head_rcu(&ei->sibling_inodes, &pid->inodes);
+ spin_unlock(&pid->lock);
+
+ return inode;
+}
+
int pid_getattr(struct user_namespace *mnt_userns, const struct path *path,
struct kstat *stat, u32 request_mask, unsigned int query_flags)
{
@@ -3351,7 +3367,7 @@ static struct dentry *proc_pid_instantiate(struct dentry * dentry,
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);
@@ -3650,7 +3666,7 @@ static struct dentry *proc_task_instantiate(struct dentry *dentry,
struct task_struct *task, const void *ptr)
{
struct inode *inode;
- inode = proc_pid_make_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
+ inode = proc_pid_make_base_inode(dentry->d_sb, task, S_IFDIR | S_IRUGO | S_IXUGO);
if (!inode)
return ERR_PTR(-ENOENT);

Eric
.