Re: [PATCH v4] proc: Fix a dentry lock race between release_task and lookup

From: Zhihao Cheng
Date: Mon Jul 11 2022 - 23:07:15 EST


在 2022/6/10 16:09, Zhihao Cheng 写道:
在 2022/6/1 14:23, Zhihao Cheng 写道:
ping again.
friendly ping
Commit 7bc3e6e55acf06 ("proc: Use a list of inodes to flush from proc")
moved proc_flush_task() behind __exit_signal(). Then, process systemd
can take long period high cpu usage during releasing task in following
concurrent processes:

   systemd                                 ps
kernel_waitid                 stat(/proc/tgid)
   do_wait                       filename_lookup
     wait_consider_task            lookup_fast
       release_task
         __exit_signal
           __unhash_process
             detach_pid
               __change_pid // remove task->pid_links
                                      d_revalidate -> pid_revalidate // 0
                                      d_invalidate(/proc/tgid)
                                        shrink_dcache_parent(/proc/tgid)
                                          d_walk(/proc/tgid)
spin_lock_nested(/proc/tgid/fd)
                                            // iterating opened fd
         proc_flush_pid                                    |
            d_invalidate (/proc/tgid/fd)                   |
               shrink_dcache_parent(/proc/tgid/fd)         |
                 shrink_dentry_list(subdirs)               ↓
                   shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock

Function d_invalidate() will remove dentry from hash firstly, but why does
proc_flush_pid() process dentry '/proc/tgid/fd' before dentry '/proc/tgid'?
That's because proc_pid_make_inode() adds proc inode in reverse order by
invoking hlist_add_head_rcu(). But proc should not add any inodes under
'/proc/tgid' except '/proc/tgid/task/pid', fix it by adding inode into
'pid->inodes' only if the inode is /proc/tgid or /proc/tgid/task/pid.

Performance regression:
Create 200 tasks, each task open one file for 50,000 times. Kill all
tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr).


.