Re: [GIT PULL] fsverity fixes for v6.3-rc4

From: Eric Biggers
Date: Mon Mar 20 2023 - 19:16:50 EST


On Mon, Mar 20, 2023 at 03:31:13PM -0700, Linus Torvalds wrote:
> On Mon, Mar 20, 2023 at 2:07 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> >
> > Nathan Huckleberry (1):
> > fsverity: Remove WQ_UNBOUND from fsverity read workqueue
>
> There's a *lot* of other WQ_UNBOUND users. If it performs that badly,
> maybe there is something wrong with the workqueue code.
>
> Should people be warned to not use WQ_UNBOUND - or is there something
> very special about fsverity?
>
> Added Tejun to the cc. With one of the main documented reasons for
> WQ_UNBOUND being performance (both implicit "try to start execution of
> work items as soon as possible") and explicit ("CPU intensive
> workloads which can be better managed by the system scheduler"), maybe
> it's time to reconsider?
>
> WQ_UNBOUND adds a fair amount of complexity and special cases to the
> workqueues, and this is now the second "let's remove it because it's
> hurting things in a big way".
>
> Linus

So, Nathan has been doing most of the investigation and testing on this, and
he's out of office at the moment.

But, my understanding is that since modern CPUs have acceleration for all the
common crypto algorithms (including fsverity's SHA-256), the work items just
don't take long enough for the overhead of a context switch to be worth it.
WQ_UNBOUND seems to be optimized for much longer running work items.

Additionally, the WQ_UNBOUND overhead is particularly bad on arm64. We aren't
sure of the reason for this. Nathan thinks this is probably related to overhead
of saving/restoring the FPU+SIMD state. My theory is that it's mainly caused by
heterogeneous processors, where work that would ordinarily run on the fastest
CPU core gets scheduled on a slow CPU core. Maybe it's a combination of both.

WQ_UNBOUND has been shown to be detrimental to EROFS decompression and to
dm-verity too, so this isn't specific to fsverity. (fscrypt is still under
investigation. I'd guess the same applies, but it's been less of a priority
since fscrypt doesn't use a workqueue when inline encryption is being used.)

These are all "I/O post-processing cases", though, so all sort of similar.

- Eric