Re: [PATCH v4] erofs: replace erofs_unzipd workqueue with per-cpu threads

From: Gao Xiang
Date: Wed Feb 08 2023 - 03:13:06 EST




On 2023/2/8 14:58, Sandeep Dhavale wrote:
On Mon, Feb 6, 2023 at 6:55 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:



On 2023/2/7 03:41, Sandeep Dhavale wrote:
On Mon, Feb 6, 2023 at 2:01 AM Gao Xiang <xiang@xxxxxxxxxx> wrote:

Hi Sandeep,

On Fri, Jan 06, 2023 at 07:35:01AM +0000, Sandeep Dhavale wrote:
Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.

The results were evaluated on arm64 Android devices running 5.10 kernel.

The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
| | workqueue | kthread_worker | diff |
+-------------------------+-----------+----------------+---------+
| Average (us) | 15253 | 2914 | -80.89% |
| Median (us) | 14001 | 2912 | -79.20% |
| Minimum (us) | 3117 | 1027 | -67.05% |
| Maximum (us) | 30170 | 3805 | -87.39% |
| Standard deviation (us) | 7166 | 359 | |
+-------------------------+-----------+----------------+---------+

Background: Boot times and cold app launch benchmarks are very
important to the android ecosystem as they directly translate to
responsiveness from user point of view. While erofs provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.

Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and
talk at [1]

[1] https://lpc.events/event/16/contributions/1338/

Signed-off-by: Sandeep Dhavale <dhavale@xxxxxxxxxx>
---
V3 -> V4
* Updated commit message with background information
V2 -> V3
* Fix a warning Reported-by: kernel test robot <lkp@xxxxxxxxx>
V1 -> V2
* Changed name of kthread_workers from z_erofs to erofs_worker
* Added kernel configuration to run kthread_workers at normal or
high priority
* Added cpu hotplug support
* Added wrapped kthread_workers under worker_pool
* Added one unbound thread in a pool to handle a context where
we already stopped per-cpu kthread worker
* Updated commit message

I've just modified your v4 patch based on erofs -dev branch with
my previous suggestion [1], but I haven't tested it.

Could you help check if the updated patch looks good to you and
test it on your side? If there are unexpected behaviors, please
help update as well, thanks!
Thanks Xiang, I was working on the same. I see that you have cleaned it up.
I will test it and report/fix any problems.

Thanks,
Sandeep.

Thanks! Look forward to your test. BTW, we have < 2 weeks for 6.3, so I'd
like to fix it this week so that we could catch 6.3 merge window.


I've fixed some cpu hotplug errors as below and added to a branch for 0day CI
testing.

Hi Xiang,
With this version of the patch I have tested
- Multiple device reboot test
- Cold App launch tests
- Cold App launch tests with cpu offline/online

All tests ran successfully and no issue was observed.

Okay, thanks! I will resend & submit this version for -next now
and test on my side if no other concerns.

Thanks,
Gao XIang