[PATCH v5 00/10] VFS hot tracking

From: zwu . kernel
Date: Mon Sep 16 2013 - 18:20:27 EST


From: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>

The patchset is trying to introduce hot tracking function in
VFS layer, which will keep track of real disk I/O in memory.
By it, you will easily know more details about disk I/O, and
then detect where disk I/O hot spots are. Also, specific FS
can take use of it to do accurate defragment, and hot relocation
support, etc.

Now it's time to send out its V5 for external review, and
any comments or ideas are appreciated, thanks.

NOTE:

The patchset can be obtained via my kernel dev git on github:
git://github.com/wuzhy/kernel.git hot_tracking
If you're interested, you can also review them via
https://github.com/wuzhy/kernel/commits/hot_tracking

For how to use and more other info and performance report,
please check hot_tracking.txt in Documentation and following
links:
1.) http://lwn.net/Articles/525651/
2.) https://lkml.org/lkml/2012/12/20/199

This patchset has been done scalability or performance tests
by fs_mark, ffsb and compilebench.

The perf testing was done on Linux 3.11.0+ with Intel(R) Core(TM)
i7-3770 CPU @ 3.40GHz with 8 CPUs, 16G ram and 260G disk.

Below is the perf testing report:

1. fs_mark test

w/ : with hot tracking
w/o: without hot tracking

Count Size FSUse% Files/sec App Overhead
w/ w/o w/ w/o w/ w/o
800000 1 5 5 5606.9 40486.6 7773339 8575934
1600000 1 5 5 1244.8 1194.8 8262292 8253933
2400000 1 6 6 1155.7 997.2 7640679 7854540
3200000 1 7 8 1079.7 1124.0 7373659 8121016
4000000 1 9 9 1169.4 1324.8 7961605 9598549
4800000 1 10 10 1259.8 1331.7 8992159 8743297
5600000 1 11 11 1337.7 1339.3 8675246 8029501
6400000 1 13 13 1346.7 1365.5 8613958 10018455
7200000 1 14 14 1339.8 1423.1 7885932 8466961
8000000 1 15 15 1353.0 1368.6 13543947 9727348
8800000 1 16 17 1460.7 1396.4 8744351 8034638
9600000 1 18 18 1462.9 1415.4 11678864 8557992
10400000 1 19 19 1503.8 1457.6 8984918 9696330
11200000 1 20 20 1521.9 1491.4 8732741 8307835
12000000 1 21 22 1617.7 1556.0 12948158 8776620
12800000 1 23 23 1518.0 1572.3 8470307 8652605
13600000 1 24 24 1595.8 1570.5 11476909 8622940
14400000 1 25 26 1651.8 1722.1 11864599 9646962
15200000 1 26 27 1696.8 1619. 10679127 8472579
16000000 1 28 28 1567.4 1652.3 8756616 8713324
16800000 1 29 29 1599.9 1683.7 10982360 9084005
17600000 1 31 30 1671.3 1699.6 9559853 8388523
18400000 1 32 32 1567.3 1666.7 10576088 11717888
19200000 1 33 33 1668.4 1606.0 8657168 9063387
20000000 1 34 34 1654.1 1521.5 11115008 8384464
20800000 1 36 36 1637.6 1666.2 9964151 8176858
21600000 1 37 37 1598.7 1677.0 8648364 8190571
22400000 1 38 38 1688.8 1674.0 8881927 12847479
23200000 1 39 39 1627.0 1648.2 8707422 9350644
24000000 1 41 41 1704.7 1718.9 9525011 8437322
24800000 1 42 42 1628.2 1649.7 8445795 9195963
25600000 1 43 43 1690.4 1647.3 10444544 10808578
26400000 1 44 44 1597.4 1582.4 8956981 12286644
27200000 1 46 46 1677.7 1710.4 8244101 9492204
28000000 1 47 47 1664.9 1640.9 8860491 8683678
28800000 1 48 48 1608.7 1670.8 8381652 12105478
29600000 1 50 50 1682.0 1652.4 13991121 8630876
30400000 1 51 51 1672.6 1743.2 8853590 10377349
31200000 1 52 52 1648.5 1691.3 11290708 8407930
32000000 1 53 53 1649.5 1708.1 11647884 10120780
32800000 1 55 55 1725.2 1663.4 9641226 10092158
33600000 1 56 56 1662.2 1668.9 12228440 8579953
34400000 1 57 57 1629.7 1688.0 8232209 8290118
35200000 1 59 59 1711.5 1733.5 8175308 9081545
36000000 1 60 60 1670.6 1742.4 9884533 8554858
36800000 1 61 61 1663.0 1654.8 13227858 9112083
37600000 1 62 62 1692.4 1663.0 8590629 8884916
38400000 1 64 64 1691.6 1617.1 9437834 11534400
39200000 1 65 65 1763.5 1646.3 10385440 9854624
40000000 1 66 66 1686.8 1643.8 8860676 9939637
40800000 1 67 67 1542.9 1652.9 9280078 17640321
41600000 1 68 69 1696.2 1655.4 8972165 9473507
42400000 1 70 70 1637.8 1685.2 8294407 8767330
43200000 1 71 71 1712.8 1739.8 14135589 9175591
44000000 1 72 73 1692.4 1632.2 10287428 9130585
44800000 1 73 74 1794.9 1685.0 10727955 9486110
45600000 1 75 75 1438.1 1624.3 8476478 9232791
46400000 1 76 76 1761.2 1768.7 8644609 15745264
47200000 1 77 77 1684.2 1505.7 10269613 12412119
48000000 1 79 79 1647.0 1713.2 8287281 15352189
48800000 1 80 80 1665.7 1675.0 17468300 9012407
49600000 1 81 81 1632.5 1692.5 8178082 8865803
50400000 1 83 83 1584.5 1752.1 12857867 11970443

2. FFSB test

w/ hot tracking w/o hot tracking ratio
v1 v2 (v1-v2)/v2
large_file_create
1 thread
- Trans/sec 28091.76 28126.31 -0.12%
- Throughput 110MB/sec 110MB/sec +0.00%
- %CPU 10.7% 11.2% -4.47%
- Trans/%CPU 2625.4 2511.28 -4.54%

8 threads
- Trans/sec 27980.47 28140.34 -0.57%
- Throughput 109MB/sec 110MB/sec -0.91%
- %CPU 12.3% 12.8% -3.90%
- Trans/%CPU 2274.83 2198.46 +3.47%

16 threads
- Trans/sec 27764.36 27940.96 -0.63%
- Throughput 108MB/sec 109MB/sec -0.92%
- %CPU 12.8% 13.7% -6.57%
- Trans/%CPU 2169.09 2039.49 +6.35%

32 threads
- Trans/sec 27461.82 27624.48 -0.59%
- Throughput 107MB/sec 108MB/sec -0.93%
- %CPU 13.7% 14.4% -4.86%
- Trans/%CPU 2004.51 1918.37 +4.49%

large_file_seq_read
1 thread
- Trans/sec 34121.46 34838.65 -2.06%
- Throughput 133MB/sec 136MB/sec -2.21%
- %CPU 8.8% 8.8% +0.00%
- Trans/%CPU 3877.44 3958.94 -2.06%

8 threads
- Trans/sec 10883.15 11679.40 -6.82%
- Throughput 42.5MB/sec 45.6MB/sec -6.80%
- %CPU 3.3% 3.4% -2.94%
- Trans/%CPU 3297.92 3435.12 -3.99%

16 threads
- Trans/sec 5760.16 6193.20 -6.99%
- Throughput 22.5MB/sec 24.2MB/sec -7.02%
- %CPU 1.8% 1.9% -5.26%
- Trans/%CPU 3200.09 3259.58 -1.83%

32 threads
- Trans/sec 5470.50 5490.12 -0.36%
- Throughput 21.4MB/sec 21.4MB/sec +0.00%
- %CPU 1.7% 1.7% +0.00%
- Trans/%CPU 3217.94 3229.48 -0.36%

random_write
1 thread
- Trans/sec 1611.99 1582.57 +1.86%
- Throughput 220MB/sec 216MB/sec +1.85%
- %CPU 0.6% 0.6% +0.00%
- Trans/%CPU 2686.65 2637.62 +1.86%

8 threads
- Trans/sec 2215.59 2292.57 -3.36%
- Throughput 303MB/sec 313MB/sec -3.39%
- %CPU 1.4% 1.5% -6.67%
- Trans/%CPU 1582.56 1528.38 +3.35%

16 threads
- Trans/sec 2068.52 1935.96 +6.85%
- Throughput 283MB/sec 265MB/sec +6.79%
- %CPU 1.3% 1.3% +0.00%
- Trans/%CPU 1591.17 1464.8 +8.63%

32 threads
- Trans/sec 1764.28 1875.23 -5.92%
- Throughput 241MB/sec 256MB/sec -5.86%
- %CPU 1.2% 1.3% -7.69%
- Trans/%CPU 1470.23 1442.48 +1.92%

random_read
1 thread
- Trans/sec 222.84 224.28 -0.64%
- Throughput 891KB/sec 897KB/sec -0.67%
- %CPU 1.1% 1.0% +10.0%
- Trans/%CPU 202.58 224.28 -9.68%

8 threads
- Trans/sec 143.30 136.47 +5.01%
- Throughput 573KB/sec 546KB/sec +4.95%
- %CPU 0.5% 0.5% +0.00%
- Trans/%CPU 286.60 272.94 +5.01%

16 threads
- Trans/sec 105.17 103.75 +1.37%
- Throughput 421KB/sec 415KB/sec +1.45%
- %CPU 0.5% 0.5% +0.00%
- Trans/%CPU 210.34 207.5 +1.37%

32 threads
- Trans/sec 105.78 103.39 +2.31%
- Throughput 423KB/sec 414KB/sec +2.17%
- %CPU 0.5% 0.5% +0.00%
- Trans/%CPU 211.56 206.78 +2.31%

mail_server
1 thread
- Trans/sec [read] 433.23 446.68 -3.01%
- Throughput [read] 1.7MB/sec 1.75MB/sec -2.86%
- Trans/sec [write] 224.06 213.84 +4.78%
- Throughput [write] 889KB/sec 848KB/sec +4.83%
- %CPU 0.8% 0.8% +0.00%
- Trans/%CPU [read] 541.54 558.35 -3.01%
- Trans/%CPU [write] 280.08 267.3 +4.78%

8 threads
- Trans/sec [read] 430.47 435.84 -1.23%
- Throughput [read] 1.69MB/sec 1.71MB/sec -1.17%
- Trans/sec [write] 198.18 207.61 -4.54%
- Throughput [write] 786KB/sec 823KB/sec -4.50%
- %CPU 0.9% 0.9% +0.00%
- Trans/%CPU [read] 478.3 484.27 -1.23%
- Trans/%CPU [write] 220.2 230.68 -4.54%

16 threads
- Trans/sec [read] 326.05 347.85 -6.27%
- Throughput [read] 1.28MB/sec 1.37MB/sec -6.57%
- Trans/sec [write] 187.69 177.59 +5.69%
- Throughput [write] 744KB/sec 705KB/sec +5.53%
- %CPU 0.9% 0.9% +0.00%
- Trans/%CPU [read] 362.28 386.5 -6.27%
- Trans/%CPU [write] 208.54 197.2 +5.75%

32 threads
- Trans/sec [read] 388.04 419.52 -7.50%
- Throughput [read] 1.53MB/sec 1.65MB/sec -7.27%
- Trans/sec [write] 204.70 207.50 -1.35%
- Throughput [write] 811KB/sec 823KB/sec -1.46%
- %CPU 1.2% 1.2% +0.00%
- Trans/%CPU [read] 323.37 349.6 -7.50%
- Trans/%CPU [write] 170.58 172.92 -1.35%

3. Compilebench test

w/ hot tracking w/o hot tracking ratio
v1 v2 (v1-v2)/v2
intial create 59.33 MB/s 63.25 MB/s -6.20%

create 91.81 MB/s 81.12 MB/s +13.18%

patch 12.39 MB/s 14.94 MB/s -17.07%

compile 470.24 MB/s 442.08 MB/s +6.37%

clean 2205.16 MB/s 1992.06 MB/s +10.70%

read tree 136.77 MB/s 142.41 MB/s -3.96%

read compiled tree 46.83 MB/s 50.08 MB/s -6.49%

delete tree 3.48 seconds 3.02 seconds +15.23%

delete compiled tree 3.94 seconds 3.98 seconds -1.01%

stat tree 1.45 seconds 1.66 seconds -12.65%

stat compiled tree 0.71 seconds 0.86 seconds -17.44%

Changelog from v4:
- Added all kinds of perf testing report [viro]
- Covered mmap() now [viro]
- Removed list_sort() in hot_update_worker() to avoid locking contention
and cacheline bouncing [viro]
- Removed a /proc interface to control low memory usage [Chandra]
- Adjusted shrinker support due to the change of public shrinker APIs [zwu]
- Fixed the locking missing issue when hot_inode_item_put() is called
in ioctl_heat_info() [viro]
- Fixed some locking contention issues [zwu]

v4:
- Removed debugfs support, but leave it to TODO list [viro, Chandra]
- Killed HOT_DELETING and HOT_IN_LIST flag [viro]
- Fixed unlink issues [viro]
- Fixed the issue on lookups (both for inode and range)
leak on race with unlink [viro]
- Killed hot_comm_item and split the functions which take it [virio]
- Fixed some other issues [zwu, Chandra]

v3:
- Added memory caping function for hot items [Zhiyong]
- Cleanup aging function [Zhiyong]

v2:
- Refactored to be under RCU [Chandra Seetharaman]
Merged some code changes [Chandra Seetharaman]
- Fixed some issues [Chandra Seetharaman]

v1:
- Solved 64 bits inode number issue. [David Sterba]
- Embed struct hot_type in struct file_system_type [Darrick J. Wong]
- Cleanup Some issues [David Sterba]
- Use a static hot debugfs root [Greg KH]

rfcv4:
- Introduce hot func registering framework [Zhiyong]
- Remove global variable for hot tracking [Zhiyong]
- Add btrfs hot tracking support [Zhiyong]

rfcv3:
1.) Rewritten debugfs support based seq_file operation. [Dave Chinner]
2.) Refactored workqueue support. [Dave Chinner]
3.) Turn some Micro into be tunable [Zhiyong, Liu Zheng]
TIME_TO_KICK, and HEAT_UPDATE_DELAY
4.) Cleanedup a lot of other issues [Dave Chinner]


rfcv2:
1.) Converted to Radix trees, not RB-tree [Zhiyong, Dave Chinner]
2.) Added memory shrinker [Dave Chinner]
3.) Converted to one workqueue to update map info periodically [Dave Chinner]
4.) Cleanedup a lot of other issues [Dave Chinner]

rfcv1:
1.) Reduce new files and put all in fs/hot_tracking.[ch] [Dave Chinner]
2.) The first three patches can probably just be flattened into one.
[Marco Stornelli , Dave Chinner]

Dave Chinner (1):
VFS hot tracking, xfs: Add hot tracking support

Zhi Yong Wu (9):
VFS hot tracking: Define basic data structures and functions
VFS hot tracking: Track IO and record heat information
VFS hot tracking: Add a workqueue to move items between hot maps
VFS hot tracking: Add shrinker functionality to curtail memory usage
VFS hot tracking: Add an ioctl to get hot tracking information
VFS hot tracking: Add a /proc interface to make the interval tunable
VFS hot tracking: Add a /proc interface to control memory usage
VFS hot tracking: Add documentation
VFS hot tracking, btrfs: Add hot tracking support

Documentation/filesystems/00-INDEX | 2 +
Documentation/filesystems/hot_tracking.txt | 207 ++++++++
fs/Makefile | 2 +-
fs/btrfs/ctree.h | 1 +
fs/btrfs/super.c | 22 +-
fs/compat_ioctl.c | 5 +
fs/dcache.c | 2 +
fs/direct-io.c | 5 +
fs/hot_tracking.c | 811 +++++++++++++++++++++++++++++
fs/hot_tracking.h | 66 +++
fs/ioctl.c | 71 +++
fs/namei.c | 3 +
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 18 +
include/linux/fs.h | 4 +
include/linux/hot_tracking.h | 146 ++++++
kernel/sysctl.c | 14 +
mm/filemap.c | 19 +-
mm/page-writeback.c | 13 +
mm/readahead.c | 6 +
20 files changed, 1414 insertions(+), 4 deletions(-)
create mode 100644 Documentation/filesystems/hot_tracking.txt
create mode 100644 fs/hot_tracking.c
create mode 100644 fs/hot_tracking.h
create mode 100644 include/linux/hot_tracking.h

--
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/