[performance regression, bisected] scheduler: should_we_balance()kills filesystem performance

From: Dave Chinner
Date: Tue Sep 10 2013 - 00:03:23 EST


Hi folks,

I just updated my performance test VM to the current 3.12-git
tree after the XFS dev branch was merged. The first test I ran
which was a 16-way concurrent fsmark test to create lots of files
gave me a number about 30% lower than I expected - ~180k files/s
when I was expecting somewhere around 250k files/s.

I did a bisect, and the bisect landed on this commit:

commit 23f0d2093c789e612185180c468fa09063834e87
Author: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Date: Tue Aug 6 17:36:42 2013 +0900

sched: Factor out code to should_we_balance()

Now checking whether this cpu is appropriate to balance or not
is embedded into update_sg_lb_stats() and this checking has no direct
relationship to this function. There is not enough reason to place
this checking at update_sg_lb_stats(), except saving one iteration
for sched_group_cpus.
....

Now, i couldn't revert that patch by itself, but I reverted the
series of about 10 scheduler patches in that series total from a
current TOT and the regression went away. Hence I'm pretty confident
that the this is the patch causing the issue as i've verified it in
more than one way and the difference between "good" and "bad" was
signficantlt greater than the variance of the test (1.5-2 stddev
difference).

In more detail:

v4 filesystem v5 filesystem
3.11+xfsdev: 220k files/s 225k files/s
3.12-git 180k files/s 185k files/s
3.12-git-revert 245k files/s 247k files/s

The test vm is a 16p/16GB RAM VM, with a sparse 100TB filesystem
image sitting on a 4-way RAID0 SSD array formatted with XFS and the
image file is accessed by virtio+direct IO. The fsmark command line
is:

time ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 32 \
-d /mnt/scratch/0 -d /mnt/scratch/1 \
-d /mnt/scratch/2 -d /mnt/scratch/3 \
-d /mnt/scratch/4 -d /mnt/scratch/5 \
-d /mnt/scratch/6 -d /mnt/scratch/7 \
-d /mnt/scratch/8 -d /mnt/scratch/9 \
-d /mnt/scratch/10 -d /mnt/scratch/11 \
-d /mnt/scratch/12 -d /mnt/scratch/13 \
-d /mnt/scratch/14 -d /mnt/scratch/15 \
| tee >(stats --trim-outliers | tail -1 1>&2)

The workload on XFS runs to almost being CPU bound - the effect of
the above patch was that there was a lot of idle time left in the
system. The workload consumed the same amount of user and system
CPU, just instantaneous CPU usage was reduced by 20-30% and the
elaspsed time was increased by 20-30%.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/