Re: vchiq: Performance regression since 5.18-rc1

From: Stefan Wahren
Date: Mon May 23 2022 - 03:37:08 EST


Hi Paul,

Am 23.05.22 um 06:48 schrieb Paul E. McKenney:
On Sun, May 22, 2022 at 05:11:36PM +0200, Stefan Wahren wrote:
Hi Paul,

Am 22.05.22 um 01:46 schrieb Paul E. McKenney:
On Sun, May 22, 2022 at 01:22:00AM +0200, Stefan Wahren wrote:
Hi,

while testing the staging/vc04_services/interface/vchiq_arm driver with my
Raspberry Pi 3 B+ (multi_v7_defconfig) i noticed a huge performance
regression since [ff042f4a9b050895a42cae893cc01fa2ca81b95c] mm:
lru_cache_disable: replace work queue synchronization with synchronize_rcu

Usually i run "vchiq_test -f 1" to see the driver is still working [1].

Before commit:

real    0m1,500s
user    0m0,068s
sys    0m0,846s

After commit:

real    7m11,449s
user    0m2,049s
sys    0m0,023s

Best regards

[1] - https://github.com/raspberrypi/userland
Please feel free to try the patch shown below. Or the pair of patches
from Rik here:

https://lore.kernel.org/lkml/20220218183114.2867528-2-riel@xxxxxxxxxxx/
https://lore.kernel.org/lkml/20220218183114.2867528-3-riel@xxxxxxxxxxx/
I tried your patch and Rik's patches but in both cases vchiq_test runs 7
minutes instead of ~ 1 second.
That is surprising. Do you boot with rcupdate.rcu_normal=1?
No, not explicit.
That would
nullify my patch, but I would expect that Rik's patch would still provide
increased performance even in that case.
I will retest with a fresh SD card image.

Could you please characterize where the slowdown is occurring?

Unfortunately i don't have a deep insight into driver and vchiq_test tool. Just a user view.

Do you think an strace would be a good starting point?

@Phil Any advices to analyse this issue?


Thanx, Paul

Best regards

There is work ongoing to produce something better, but ongoing slowly.
Especially my part of that work.

Thanx, Paul

------------------------------------------------------------------------

From paulmck@xxxxxxxxxx Mon Feb 14 11:05:49 2022
Date: Mon, 14 Feb 2022 11:05:49 -0800
From: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
To: clm@xxxxxx
Cc: riel@xxxxxxxxxxx, viro@xxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx,
linux-fsdevel@xxxxxxxxxxxxxxx, kernel-team@xxxxxx
Subject: [PATCH RFC fs/namespace] Make kern_unmount() use
synchronize_rcu_expedited()
Message-ID: <20220214190549.GA2815154@paulmck-ThinkPad-P17-Gen-1>
Reply-To: paulmck@xxxxxxxxxx
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Status: RO
Content-Length: 1036
Lines: 32

Experimental. Not for inclusion. Yet, anyway.

Freeing large numbers of namespaces in quick succession can result in
a bottleneck on the synchronize_rcu() invoked from kern_unmount().
This patch applies the synchronize_rcu_expedited() hammer to allow
further testing and fault isolation.

Hey, at least there was no need to change the comment! ;-)

Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: <linux-fsdevel@xxxxxxxxxxxxxxx>
Cc: <linux-kernel@xxxxxxxxxxxxxxx>
Not-yet-signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

---

namespace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 40b994a29e90d..79c50ad0ade5b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4389,7 +4389,7 @@ void kern_unmount(struct vfsmount *mnt)
/* release long term mount so mount point can be released */
if (!IS_ERR_OR_NULL(mnt)) {
real_mount(mnt)->mnt_ns = NULL;
- synchronize_rcu(); /* yecchhh... */
+ synchronize_rcu_expedited(); /* yecchhh... */
mntput(mnt);
}
}