Re: [LKP] Re: [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression

From: Xing Zhengjun
Date: Tue Aug 18 2020 - 21:53:52 EST




On 7/22/2020 2:17 PM, Xing Zhengjun wrote:


On 7/15/2020 7:04 PM, Ritesh Harjani wrote:
Hello Xing,

On 4/7/20 1:30 PM, kernel test robot wrote:
Greeting,

FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec due to commit:


commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move ext4_fiemap to use iomap framework")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

    nr_threads: 10%
    disk: 1HDD
    testtime: 1s
    class: os
    cpufreq_governor: performance
    ucode: 0x500002c
    fs: ext4

I started looking into this issue. But with my unit testing, I didn't
find any perf issue with fiemap ioctl call. I haven't yet explored about
how stress-ng take fiemap performance numbers, it could be doing
something differently. But in my testing I just made sure to create a
file with large number of extents and used xfs_io -c "fiemap -v" cmd
to check how much time it takes to read all the entries in 1st
and subsequent iterations.


Setup comprised of qemu machine on x86_64 with latest linux branch.

1. created a file of 10G using fallocate. (this allocated unwritten
extents for this file).

2. Then I punched hole on every alternate block of file. This step took
a long time. And after sufficiently long time, I had to cancel it.
for i in $(seq 1 2 xxxxx); do echo $i; fallocate -p -o $(($i*4096)) -l 4096; done

3. Then issued fiemap call via xfs_io and took the time measurement.
time xfs_io -c "fiemap -v" bigfile > /dev/null


Perf numbers on latest default kernel build for above cmd.

1st iteration
==============
real    0m31.684s
user    0m1.593s
sys     0m24.174s

2nd and subsequent iteration
============================
real    0m3.379s
user    0m1.300s
sys     0m2.080s


4. Then I reverted all the iomap_fiemap patches and re-tested this.
With this the older ext4_fiemap implementation will be tested:-


1st iteration
==============
real    0m31.591s
user    0m1.400s
sys     0m24.243s


2nd and subsequent iteration (had to cancel it since it was taking more time then 15m)
============================
^C^C
real    15m49.884s
user    0m0.032s
sys     15m49.722s

I guess the reason why 2nd iteration with older implementation is taking
too much time is since with previous implementation we never cached
extent entries in extent_status tree. And also in 1st iteration the page
cache may get filled with lot of buffer_head entries. So maybe page
reclaims are taking more time.

While with the latest implementation using iomap_fiemap(), the call to query extent blocks is done using ext4_map_blocks(). ext4_map_blocks()
by default will also cache the extent entries into extent_status tree.
Hence during 2nd iteration, we will directly read the entries from extent_status tree and will not do any disk I/O.

-ritesh

I re-test it on the v5.9-rc1, the regression still existed. Have you tried stress-ng test cases?


Could you try stress-ng( https://kernel.ubuntu.com/~cking/stress-ng/) test cases?  The tarballs can be get from https://kernel.ubuntu.com/~cking/tarballs/stress-ng/.
The command for this case you can try "stress-ng --timeout 1 --times --verify --metrics-brief --sequential 9 --class os --minimize --exclude spawn,exec,swap"
I re-test it on the v5.8-rc6, the regression still existed.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/debug-setup/nr_threads/disk/testtime/fs/class/cpufreq_governor/ucode:

lkp-csl-2sp5/stress-ng/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/test/10%/1HDD/1s/ext4/os/performance/0x5002f01

commit:
  b2c5764262edded1b1cfff5a6ca82c3d61bb4a4a
  d3b6f23f71670007817a5d59f3fbafab2b794e8c
  v5.8-rc6

b2c5764262edded1 d3b6f23f71670007817a5d59f3f                    v5.8-rc6
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     20419 ±  3%      -4.9%      19423 ±  4%     +27.1%      25959 stress-ng.af-alg.ops
     19655 ±  3%      -5.7%      18537 ±  4%     +27.8%      25111 stress-ng.af-alg.ops_per_sec
     64.67 ±  5%     -17.0%      53.67 ± 38%     +22.2%      79.00 ± 9%  stress-ng.chdir.ops
     55.34 ±  3%     -13.3%      47.99 ± 38%     +26.4%      69.96 ± 10%  stress-ng.chdir.ops_per_sec
     64652 ±  7%     -14.1%      55545 ± 11%     -13.6%      55842 ± 6%  stress-ng.chown.ops
     64683 ±  7%     -14.1%      55565 ± 11%     -13.6%      55858 ± 6%  stress-ng.chown.ops_per_sec
      2805 ±  2%      +0.6%       2820 ±  2%    +130.0%       6452 stress-ng.clone.ops
      2802 ±  2%      +0.6%       2818 ±  2%    +129.9%       6443 stress-ng.clone.ops_per_sec
     34.67            +1.9%      35.33 ±  3%      -9.6%      31.33 ± 3%  stress-ng.copy-file.ops
     22297 ± 23%     +26.7%      28258 ±  2%     +38.1%      30783 ± 14%  stress-ng.dir.ops_per_sec
      4743 ±  7%     -12.6%       4147 ±  4%     -19.9%       3801 ± 14%  stress-ng.dirdeep.ops_per_sec
     24499 ±  8%      -9.3%      22223 ±  3%     -19.8%      19647 ± 2%  stress-ng.enosys.ops
     24494 ±  8%      -9.3%      22207 ±  3%     -19.8%      19637 ± 2%  stress-ng.enosys.ops_per_sec
     76618           -63.8%      27756 ±  5%     -55.5%      34092 ± 2%  stress-ng.fiemap.ops
     76042           -63.8%      27492 ±  5%     -55.4%      33899 ± 2%  stress-ng.fiemap.ops_per_sec
   3816791            -6.2%    3581880 ±  2%     -10.7%    3409584 ± 3%  stress-ng.file-ioctl.ops
   3817421            -6.2%    3582248 ±  2%     -10.7%    3409544 ± 3%  stress-ng.file-ioctl.ops_per_sec
     27332 ± 17%     +30.2%      35574           +37.2%      37506 ± 6%  stress-ng.filename.ops
     26222 ± 16%     +28.6%      33722           +35.6%      35567 ± 6%  stress-ng.filename.ops_per_sec
     23431 ±  2%      +0.5%      23559           +22.5%      28711 ± 4%  stress-ng.fork.ops
     23433 ±  2%      +0.5%      23558           +22.5%      28709 ± 4%  stress-ng.fork.ops_per_sec
     48074 ± 15%      -0.6%      47786 ±  9%     -42.5%      27623 stress-ng.get.ops
     48088 ± 15%      -0.6%      47793 ±  9%     -42.6%      27624 stress-ng.get.ops_per_sec
    111379            -1.4%     109815 ±  2%      -7.8%     102662 ± 4%  stress-ng.getdent.ops
    111398            -1.4%     109818 ±  2%      -7.9%     102651 ± 4%  stress-ng.getdent.ops_per_sec
    226.67 ±  2%      -1.0%     224.33           -15.1%     192.33 ± 6%  stress-ng.inotify.ops
    226.39 ±  2%      -1.3%     223.53           -15.3%     191.82 ± 6%  stress-ng.inotify.ops_per_sec
     18.00            +9.3%      19.67 ±  2%      +3.7%      18.67 ± 2%  stress-ng.iomix.ops
      4846 ± 10%      +2.9%       4984 ±  7%     -36.8%       3062 ± 7%  stress-ng.kill.ops
      4846 ± 10%      +2.8%       4982 ±  7%     -36.9%       3058 ± 7%  stress-ng.kill.ops_per_sec
    851324 ±  3%      -9.9%     766988 ±  2%      +2.3%     870703 stress-ng.lease.ops
    850238 ±  3%      -9.9%     766255 ±  2%      +2.3%     870047 stress-ng.lease.ops_per_sec
    134492 ± 14%      -7.0%     125085 ± 16%     -19.9%     107736 ± 3%  stress-ng.mlock.ops_per_sec
      1.38            +2.9%       1.42 ±  5%     -20.2%       1.10 ± 12%  stress-ng.mmapfork.ops_per_sec
   1117721 ±  5%      -9.2%    1015321 ± 20%     -36.2%     713598 stress-ng.mq.ops
   1117186 ±  5%      -9.2%    1014593 ± 20%     -36.2%     712387 stress-ng.mq.ops_per_sec
   4584592 ±  2%      -2.7%    4460470 ±  7%     -12.0%    4036627 ± 8%  stress-ng.msg.ops_per_sec
      8179 ± 10%      +7.3%       8772 ±  9%     +24.0%      10144 stress-ng.opcode.ops
      8177 ± 10%      +7.2%       8770 ±  9%     +24.0%      10141 stress-ng.opcode.ops_per_sec
      5218 ±  3%      +3.6%       5408 ±  2%     +27.3%       6642 stress-ng.reboot.ops
      5217 ±  3%      +3.6%       5407 ±  2%     +27.3%       6640 stress-ng.reboot.ops_per_sec
      1724 ± 21%     +27.9%       2205 ± 18%     +54.3%       2660 ± 19%  stress-ng.resources.ops
   1217728           -31.4%     835808 ± 14%     -18.7%     989423 ± 10%  stress-ng.revio.ops
   1217899           -31.4%     835846 ± 14%     -18.8%     989313 ± 10%  stress-ng.revio.ops_per_sec
    752664 ±  9%      -7.6%     695116 ± 10%      -0.6%     748117 ± 6%  stress-ng.seal.ops
    752957 ±  9%      -7.7%     695278 ± 10%      -0.6%     748068 ± 6%  stress-ng.seal.ops_per_sec
    628.00 ± 83%     -20.1%     502.00 ±  8%    +176.2%       1734 stress-ng.seccomp.ops
    628.22 ± 83%     -20.1%     502.08 ±  8%    +176.1%       1734 stress-ng.seccomp.ops_per_sec
   1361741 ±  7%      +4.0%    1416628 ±  8%     +17.2%    1596549 ± 9%  stress-ng.seek.ops
   1361799 ±  7%      +4.0%    1416578 ±  8%     +17.2%    1596376 ± 9%  stress-ng.seek.ops_per_sec
   1537496            +1.2%    1556468            -9.9%    1385679 stress-ng.sigrt.ops
   1532297            +1.3%    1551993            -9.9%    1381175 stress-ng.sigrt.ops_per_sec
   1324589 ± 12%     -26.6%     972626 ±  9%     -14.8%    1128923 ± 4%  stress-ng.sigsuspend.ops
   1323957 ± 12%     -26.6%     972084 ±  9%     -14.8%    1128174 ± 4%  stress-ng.sigsuspend.ops_per_sec
    806.67 ± 13%     +11.2%     896.67 ± 10%     +31.3%       1059 ± 4%  stress-ng.sockabuse.ops
    806.73 ± 13%     +11.1%     896.44 ± 10%     +31.3%       1059 ± 4%  stress-ng.sockabuse.ops_per_sec
   6224635 ±  2%      +0.8%    6274521 ±  2%     -12.8%    5427883 ± 2%  stress-ng.sockpair.ops
   5383397            -1.9%    5279526 ±  2%     -14.9%    4579084 ± 4%  stress-ng.sockpair.ops_per_sec
     22409 ±  6%      -2.8%      21784 ±  5%     -41.2%      13172 ± 9%  stress-ng.sysbadaddr.ops
     22414 ±  6%      -2.8%      21784 ±  5%     -41.3%      13167 ± 9%  stress-ng.sysbadaddr.ops_per_sec
    525.33 ± 13%      +2.6%     539.00 ± 15%     -42.1%     304.33 ± 25%  stress-ng.sysfs.ops
    524.98 ± 13%      +2.6%     538.58 ± 15%     -42.1%     303.79 ± 25%  stress-ng.sysfs.ops_per_sec
      3077 ± 12%     +19.9%       3689 ±  4%     +23.0%       3786 ± 4%  stress-ng.sysinfo.ops
      3009 ± 11%     +22.6%       3688 ±  4%     +25.7%       3783 ± 4%  stress-ng.sysinfo.ops_per_sec
   3001932            +1.2%    3038017           +15.7%    3474274 stress-ng.udp.ops
   3002618            +1.2%    3037939           +15.7%    3473606 stress-ng.udp.ops_per_sec
    184634 ± 12%      -6.9%     171823           +24.5%     229788 ± 4%  stress-ng.utime.ops
    184678 ± 12%      -7.0%     171836           +24.4%     229784 ± 4%  stress-ng.utime.ops_per_sec
      1835 ±  5%      +2.2%       1875 ±  4%      +8.2%       1986 ± 3%  stress-ng.vforkmany.ops
      1829 ±  5%      +1.9%       1863 ±  4%      +8.1%       1978 ± 3%  stress-ng.vforkmany.ops_per_sec
     14829            -1.0%      14682 ±  2%     -20.1%      11848 ± 3%  stress-ng.vm-segv.ops
     14830            -1.0%      14682 ±  2%     -29.7%      10433 ± 13%  stress-ng.vm-segv.ops_per_sec
   3527835 ±  3%      +1.9%    3596613 ±  5%    +495.6%   21010377 stress-ng.vm-splice.ops
   3528845 ±  3%      +1.9%    3597219 ±  5%    +495.4%   21011045 stress-ng.vm-splice.ops_per_sec






--
Zhengjun Xing