Re: [PATCH] xfs: optimise xfs_mod_icount/ifree when delta < 0

From: Dave Chinner
Date: Mon Nov 04 2019 - 15:49:20 EST


On Mon, Nov 04, 2019 at 07:29:40PM +0800, Shaokun Zhang wrote:
> From: Yang Guo <guoyang2@xxxxxxxxxx>
>
> percpu_counter_compare will be called by xfs_mod_icount/ifree to check
> whether the counter less than 0 and it is a expensive function.
> let's check it only when delta < 0, it will be good for xfs's performance.

Hmmm. I don't recall this as being expensive.

How did you find this? Can you please always document how you found
the problem being addressed in the commit message so that we don't
then have to ask how the problem being fixed is reproduced.

> Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx>
> Signed-off-by: Yang Guo <guoyang2@xxxxxxxxxx>
> Signed-off-by: Shaokun Zhang <zhangshaokun@xxxxxxxxxxxxx>
> ---
> fs/xfs/xfs_mount.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index ba5b6f3b2b88..5e8314e6565e 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1174,6 +1174,9 @@ xfs_mod_icount(
> int64_t delta)
> {
> percpu_counter_add_batch(&mp->m_icount, delta, XFS_ICOUNT_BATCH);
> + if (delta > 0)
> + return 0;
> +
> if (__percpu_counter_compare(&mp->m_icount, 0, XFS_ICOUNT_BATCH) < 0) {
> ASSERT(0);
> percpu_counter_add(&mp->m_icount, -delta);

I struggle to see how this is expensive when you have more than
num_online_cpus() * XFS_ICOUNT_BATCH inodes allocated.
__percpu_counter_compare() will always take the fast path so ends up
being very little code at all.

> @@ -1188,6 +1191,9 @@ xfs_mod_ifree(
> int64_t delta)
> {
> percpu_counter_add(&mp->m_ifree, delta);
> + if (delta > 0)
> + return 0;
> +
> if (percpu_counter_compare(&mp->m_ifree, 0) < 0) {
> ASSERT(0);
> percpu_counter_add(&mp->m_ifree, -delta);

This one might have some overhead because the count is often at or
around zero, but I haven't noticed it being expensive in kernel
profiles when creating/freeing hundreds of thousands of inodes every
second.

IOWs, we typically measure the overhead of such functions by kernel
profile. Creating ~200,000 inodes a second, so hammering the icount
and ifree counters, I see:

0.16% [kernel] [k] percpu_counter_add_batch
0.03% [kernel] [k] __percpu_counter_compare

Almost nothing - it's way down the long tail of noise in the
profile.

IOWs, the CPU consumed by percpu_counter_compare() is low that
optimisation isn't going to produce any measurable performance
improvement. Hence it's not really something we've concerned
ourselves about. The profile is pretty much identical for removing
hundreds of thousands of files a second, too, so there really isn't
any performance gain to be had here.

If you want to optimise code to make it faster and show a noticable
performance improvement, start by running kernel profiles while your
performance critical workload is running. Then look at what the
functions and call chains that consume the most CPU and work out how
to do them better. Those are the places that optimisation will
result in measurable performance gains....

Cheers,

Dave.

--
Dave Chinner
david@xxxxxxxxxxxxx