[PATCH v5 0/5] vfs: Use dlock list for SB's s_inodes list

From: Waiman Long
Date: Tue Aug 09 2016 - 12:54:03 EST


v4->v5:
- Rebased the patch to 4.8-rc1 (changes to fs/fs-writeback.c was
dropped).
- Use kcalloc() instead of percpu_alloc() to allocate the dlock list
heads structure as suggested by Christoph Lameter.
- Replaced patch 5 by another one that made sibling CPUs use the same
dlock list head thus reducing the number of list heads that needed
to be maintained.

v3->v4:
- As suggested by Al, encapsulate the dlock list mechanism into
the dlist_for_each_entry() and dlist_for_each_entry_safe()
which are the equivalent of list_for_each_entry() and
list_for_each_entry_safe() for regular linked list. That simplifies
the changes in the call sites that perform dlock list iterations.
- Add a new patch to make the percpu head structure cacheline aligned
to prevent cacheline contention from disrupting the performance
of nearby percpu variables.

v2->v3:
- Remove the 2 persubnode API patches.
- Merge __percpu tag patch 2 into patch 1.
- As suggested by Tejun Heo, restructure the dlock_list_head data
structure to hide the __percpu tag and rename some of the functions
and structures.
- Move most of the code from dlock_list.h to dlock_list.c and export
the symbols.

v1->v2:
- Add a set of simple per-subnode APIs that is between percpu and
per-node in granularity.
- Make dlock list to use the per-subnode APIs so as to reduce the
total number of separate linked list that needs to be managed
and iterated.
- There is no change in patches 1-5.

This is a follow up of the following patchset:

[PATCH v7 0/4] vfs: Use per-cpu list for SB's s_inodes list
https://lkml.org/lkml/2016/4/12/1009

Patch 1 introduces the dlock list. The list heads are allocated
by kcalloc() instead of percpu_alloc(). This may slightly increase
cacheline contention when multiple CPUs are accessing dlock list,
but improve performance when the whole dlock list needs to be iterated.

Patch 2 cleans up the fsnotify_unmount_inodes() function by making
the code simpler and more standard.

Patch 3 replaces the use of list_for_each_entry_safe() in
evict_inodes() and invalidate_inodes() by list_for_each_entry().

Patch 4 modifies the superblock and inode structures to use the dlock
list. The corresponding functions that reference those structures
are modified.

Patch 5 makes the sibling CPUs use the same dlock list head to reduce
the number of list heads that need to be iterated.

Jan Kara (2):
fsnotify: Simplify inode iteration on umount
vfs: Remove unnecessary list_for_each_entry_safe() variants

Waiman Long (3):
lib/dlock-list: Distributed and lock-protected lists
vfs: Use dlock list for superblock's inode list
lib/dlock-list: Make sibling CPUs share the same linked list

fs/block_dev.c | 9 +-
fs/drop_caches.c | 9 +-
fs/inode.c | 38 +++----
fs/notify/inode_mark.c | 52 ++-------
fs/quota/dquot.c | 14 +--
fs/super.c | 7 +-
include/linux/dlock-list.h | 230 +++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 8 +-
lib/Makefile | 2 +-
lib/dlock-list.c | 268 ++++++++++++++++++++++++++++++++++++++++++++
10 files changed, 548 insertions(+), 89 deletions(-)
create mode 100644 include/linux/dlock-list.h
create mode 100644 lib/dlock-list.c