Re: [PATCH v5] erofs: add per-cpu threads for decompression as an option

From: Sandeep Dhavale
Date: Thu Feb 23 2023 - 19:14:05 EST


On Thu, Feb 23, 2023 at 11:08 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2023/2/24 02:52, Gao Xiang wrote:
> > Hi Eric,
> >
> > On 2023/2/24 02:29, Eric Biggers wrote:
> >> Hi,
> >>
> >> On Wed, Feb 08, 2023 at 05:33:22PM +0800, Gao Xiang wrote:
> >>> From: Sandeep Dhavale <dhavale@xxxxxxxxxx>
> >>>
> >>> Using per-cpu thread pool we can reduce the scheduling latency compared
> >>> to workqueue implementation. With this patch scheduling latency and
> >>> variation is reduced as per-cpu threads are high priority kthread_workers.
> >>>
> >>> The results were evaluated on arm64 Android devices running 5.10 kernel.
> >>
> >> I see that this patch was upstreamed. Meanwhile, commit c25da5b7baf1d
> >> ("dm verity: stop using WQ_UNBOUND for verify_wq") was also upstreamed.
> >>
> >> Why is this more complex solution better than simply removing WQ_UNBOUND?
> >
> > I do think it's a specific issue on specific arm64 hardwares (assuming
> > qualcomm, I don't know) since WQ_UNBOUND decompression once worked well
> > on the hardwares I once used (I meant Hisilicon, and most x86_64 CPUs,
> > I tested at that time) compared with per-cpu workqueue.
> >
> > Also RT threads are also matchable with softirq approach. In addition,
> > many configurations work without dm-verity.
>
> Also for dm-verity use cases, EROFS will reuse the dm-verity context
> directly rather than kick off a new context. Yet I'm not sure there
> are still users using EROFS without dm-verity as I said above.
>
> Anyway, the original scheduling issue sounds strange for me (with my
> own landing experiences) in the beginning, and I have no way to
> confirm the cases. Just hopefully it could be resolved from the
> developer inputs and finally benefit to end users.
>
> I've already did my own stress test with this new configuration as
> well without explicit regression.
>
Hi Eric,
>From the dm-verity patch description of removing WQ_UNBOUND it seems
Nathan saw the EROFS wait time was reduced by 51% whereas high pri per-cpu
threads showed me sched latency reduced on avg by ~80%.

So from the description at least it does not look like both patches have
equal benefits. I can't argue about the size and complexity of removing
WQ_UNBOUND if it gives the same benefits, that would have been great.

I will do the app launch tests again to compare these and share.

Thanks,
Sandeep.


> >
> > I don't have more time to dig into it for now but it's important to
> > resolve this problem on some arm64 hardwares first. Also it's an
> > optional stuff, if the root cause of workqueue issue can be resolved,
> > we could consider drop it then.
> >
> > Thsnka,
> > Gao Xiang
> >
> >>
> >> - Eric