Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M

From: Nathan Chancellor
Date: Wed May 06 2020 - 01:12:04 EST


On Tue, May 05, 2020 at 03:53:45PM +0200, Arnd Bergmann wrote:
> When building for ARMv7-M, clang-9 or higher tries to unroll some loops,
> which ends up confusing the register allocator to the point of generating
> rather bad code and using more than the warning limit for stack frames:
>
> warning: stack frame size of 1200 bytes in function 'blake2b_compress' [-Wframe-larger-than=]
>
> Forcing it to not unroll the final loop avoids this problem.
>
> Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
> Signed-off-by: Arnd Bergmann <arnd@xxxxxxxx>
> ---
> crypto/blake2b_generic.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c
> index 1d262374fa4e..0ffd8d92e308 100644
> --- a/crypto/blake2b_generic.c
> +++ b/crypto/blake2b_generic.c
> @@ -129,7 +129,9 @@ static void blake2b_compress(struct blake2b_state *S,
> ROUND(9);
> ROUND(10);
> ROUND(11);
> -
> +#ifdef CONFIG_CC_IS_CLANG

Given your comment in the bug:

"The code is written to assume no loops are unrolled"

Does it make sense to make this unconditional and take compiler
heuristics out of it?

> +#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
> +#endif
> for (i = 0; i < 8; ++i)
> S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];
> }
> --
> 2.26.0
>