Re: Intel QAT on A2SDi-8C-HLN4F causes massive data corruption with dm-crypt + xfs

From: Eric Biggers
Date: Thu Mar 03 2022 - 14:21:40 EST


On Thu, Mar 03, 2022 at 01:49:03PM +0000, Giovanni Cabiddu wrote:
> On Thu, Mar 03, 2022 at 10:45:48AM +1200, Herbert Xu wrote:
> > On Wed, Mar 02, 2022 at 10:42:20PM +0000, Giovanni Cabiddu wrote:
> > >
> > > I was thinking, as an alternative, to lower the cra_priority in the QAT
> > > driver for the algorithms used by dm-crypt so they are not used by
> > > default.
> > > Is that a viable option?
> >
> > Yes I think that should work too.
> The patch below implements that solution and applies to linux-5.4.y.
> If it is ok, I can send it to stable for all kernels <= 5.4 following
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-3
>
> ---8<---
> From: Giovanni Cabiddu <giovanni.cabiddu@xxxxxxxxx>
> Date: Thu, 3 Mar 2022 11:54:07 +0000
> Subject: [PATCH] crypto: qat - drop priority of algorithms
> Organization: Intel Research and Development Ireland Ltd - Co. Reg. #308263 - Collinstown Industrial Park, Leixlip, County Kildare - Ireland
>
> The implementations of aead and skcipher in the QAT driver are not
> properly supporting requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
> If the HW queue is full, the driver returns -EBUSY but does not enqueue
> the request.
> This can result in applications like dm-crypt waiting indefinitely for a
> completion of a request that was never submitted to the hardware.
>
> To mitigate this problem, reduce the priority of all skcipher and aead
> implementations in the QAT driver so they are not used by default.
>
> This patch deviates from the original upstream solution, that prevents
> dm-crypt to use drivers registered with the flag
> CRYPTO_ALG_ALLOCATES_MEMORY, since a backport of that set to stable
> kernels may have a too wide effect.
>
> commit 7bcb2c99f8ed032cfb3f5596b4dccac6b1f501df upstream
> commit 2eb27c11937ee9984c04b75d213a737291c5f58c upstream
> commit fbb6cda44190d72aa5199d728797aabc6d2ed816 upstream
> commit b8aa7dc5c7535f9abfca4bceb0ade9ee10cf5f54 upstream
> commit cd74693870fb748d812867ba49af733d689a3604 upstream
>
> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@xxxxxxxxx>
> ---
> drivers/crypto/qat/qat_common/qat_algs.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c
> index 6b8ad3d67481..a5c28a08fd8c 100644
> --- a/drivers/crypto/qat/qat_common/qat_algs.c
> +++ b/drivers/crypto/qat/qat_common/qat_algs.c
> @@ -1274,7 +1274,7 @@ static struct aead_alg qat_aeads[] = { {
> .base = {
> .cra_name = "authenc(hmac(sha1),cbc(aes))",
> .cra_driver_name = "qat_aes_cbc_hmac_sha1",
> - .cra_priority = 4001,
> + .cra_priority = 1,
> .cra_flags = CRYPTO_ALG_ASYNC,
> .cra_blocksize = AES_BLOCK_SIZE,
> .cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1291,7 +1291,7 @@ static struct aead_alg qat_aeads[] = { {
> .base = {
> .cra_name = "authenc(hmac(sha256),cbc(aes))",
> .cra_driver_name = "qat_aes_cbc_hmac_sha256",
> - .cra_priority = 4001,
> + .cra_priority = 1,
> .cra_flags = CRYPTO_ALG_ASYNC,
> .cra_blocksize = AES_BLOCK_SIZE,
> .cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1308,7 +1308,7 @@ static struct aead_alg qat_aeads[] = { {
> .base = {
> .cra_name = "authenc(hmac(sha512),cbc(aes))",
> .cra_driver_name = "qat_aes_cbc_hmac_sha512",
> - .cra_priority = 4001,
> + .cra_priority = 1,
> .cra_flags = CRYPTO_ALG_ASYNC,
> .cra_blocksize = AES_BLOCK_SIZE,
> .cra_ctxsize = sizeof(struct qat_alg_aead_ctx),
> @@ -1326,7 +1326,7 @@ static struct aead_alg qat_aeads[] = { {
> static struct crypto_alg qat_algs[] = { {
> .cra_name = "cbc(aes)",
> .cra_driver_name = "qat_aes_cbc",
> - .cra_priority = 4001,
> + .cra_priority = 1,
> .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> .cra_blocksize = AES_BLOCK_SIZE,
> .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> @@ -1348,7 +1348,7 @@ static struct crypto_alg qat_algs[] = { {
> }, {
> .cra_name = "ctr(aes)",
> .cra_driver_name = "qat_aes_ctr",
> - .cra_priority = 4001,
> + .cra_priority = 1,
> .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> .cra_blocksize = 1,
> .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
> @@ -1370,7 +1370,7 @@ static struct crypto_alg qat_algs[] = { {
> }, {
> .cra_name = "xts(aes)",
> .cra_driver_name = "qat_aes_xts",
> - .cra_priority = 4001,
> + .cra_priority = 1,
> .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> .cra_blocksize = AES_BLOCK_SIZE,
> .cra_ctxsize = sizeof(struct qat_alg_ablkcipher_ctx),
>
> base-commit: 866ae42cf4788c8b18de6bda0a522362702861d7
> --
> 2.35.1
>

If these algorithms have critical bugs, which it appears they do, then IMO it
would be better to disable them (either stop registering them, or disable the
whole driver) than to leave them available with low cra_priority. Low
cra_priority doesn't guarantee that they aren't used.

- Eric