Re: [PATCH] Enable '-Werror' by default for all kernel builds

From: Arnd Bergmann
Date: Tue Sep 07 2021 - 05:11:38 EST


On Tue, Sep 7, 2021 at 4:32 AM Nathan Chancellor <nathan@xxxxxxxxxx> wrote:
>
> arm32-allmodconfig.log: crypto/wp512.c:782:13: error: stack frame size (1176) exceeds limit (1024) in function 'wp512_process_buffer' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/firmware/tegra/bpmp-debugfs.c:294:12: error: stack frame size (1256) exceeds limit (1024) in function 'bpmp_debug_show' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/firmware/tegra/bpmp-debugfs.c:357:16: error: stack frame size (1264) exceeds limit (1024) in function 'bpmp_debug_store' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:3043:6: error: stack frame size (1384) exceeds limit (1024) in function 'bw_calcs' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:77:13: error: stack frame size (5560) exceeds limit (1024) in function 'calculate_bandwidth' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/mtd/chips/cfi_cmdset_0001.c:1872:12: error: stack frame size (1064) exceeds limit (1024) in function 'cfi_intelext_writev' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/ntb/hw/idt/ntb_hw_idt.c:1041:27: error: stack frame size (1032) exceeds limit (1024) in function 'idt_scan_mws' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/staging/fbtft/fbtft-core.c:902:12: error: stack frame size (1072) exceeds limit (1024) in function 'fbtft_init_display_from_property' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/staging/fbtft/fbtft-core.c:992:5: error: stack frame size (1064) exceeds limit (1024) in function 'fbtft_init_display' [-Werror,-Wframe-larger-than]
> arm32-allmodconfig.log: drivers/staging/rtl8723bs/core/rtw_security.c:1288:5: error: stack frame size (1040) exceeds limit (1024) in function 'rtw_aes_decrypt' [-Werror,-Wframe-larger-than]
> arm32-fedora.log: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:3043:6: error: stack frame size (1376) exceeds limit (1024) in function 'bw_calcs' [-Werror,-Wframe-larger-than]
> arm32-fedora.log: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:77:13: error: stack frame size (5384) exceeds limit (1024) in function 'calculate_bandwidth' [-Werror,-Wframe-larger-than]
>
> Aside from the dce_calcs.c warnings, these do not seem too bad. I
> believe allmodconfig turns on UBSAN but it could also be aggressive
> inlining by clang. I intend to look at all -Wframe-large-than warnings
> closely later.

I've had them close to zero in the past, but a couple of new ones came in.

The amdgpu ones are probably not fixable unless they stop using 64-bit
floats in the kernel for
random calculations. The crypto/* ones tend to be compiler bugs, but hard to fix

> It appears that both Arch Linux and Fedora define CONFIG_FRAME_WARN
> as 1024, below its default of 2048. I am not sure these look particurly
> scary (although there are some that are rather large that need to be
> looked at).

For 64-bit, you usually need 1280 bytes stack space to get a
reasonably clean build,
anything that uses more than that tends to be a bug in the code but we
never warned
about those by default as the default warning limit in defconfig is 2048.

I think the distros using 1024 did that because they use a common base config
for 32-bit and 64-bit targets.

> I suspect this is a backend problem because these do not really appear
> in any other configurations (might also be something with a sanitizer?)

Agreed. Someone needs to bisect the .config or the compiler flags to see what
triggers them.

> s390x-defconfig.log: include/asm-generic/io.h:464:31: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:477:61: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:490:61: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:501:33: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:511:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:521:59: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:609:20: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:617:20: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:625:20: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:634:21: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:643:21: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
> s390x-defconfig.log: include/asm-generic/io.h:652:21: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic]
>
> This affected all s390x configs I test. fs/btrfs force enables W=1 so we
> get these. This is known and had a solution rejected at pull time:
>
> https://github.com/ClangBuiltLinux/linux/issues/1285
> https://lore.kernel.org/r/20210510145234.594814-1-schnelle@xxxxxxxxxxxxx/
> https://lore.kernel.org/r/CAK8P3a2oZ-+qd3Nhpy9VVXCJB3DU5N-y-ta2JpP0t6NHh=GVXw@xxxxxxxxxxxxxx/

I posted a new idea for a patch, but it needs more work. I'm happy to work with
any volunteers that want to help tighten the Kconfig dependencies to ensure that
those drivers are only built on architectures that provide I/O port accesses.

> x86_64-allmodconfig-O3.log:drivers/net/ethernet/microchip/sparx5/sparx5_calendar.c:566:5: error: stack frame size (2504) exceeds limit (2048) in function 'sparx5_config_dsm_calendar' [-Werror,-Wframe-larger-than]
>
> Probably aggressive inlining due to testing -O3.

If inlining causes it, it was already bad without the inlining. It
looks like there are
some large arrays on the stack of some of the called functions, so a driver fix
is needed anyway.

> x86_64-alpine.log:drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:452:13: error: stack frame size (1800) exceeds limit (1280) in function 'dcn_bw_calc_rq_dlg_ttu' [-Werror,-Wframe-larger-than]
> x86_64-alpine.log:drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_rq_dlg_calc_21.c:1657:6: error: stack frame size (1336) exceeds limit (1280) in function 'dml21_rq_dlg_get_dlg_reg' [-Werror,-Wframe-larger-than]
> x86_64-alpine.log:drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_rq_dlg_calc_30.c:1831:6: error: stack frame size (1352) exceeds limit (1280) in function 'dml30_rq_dlg_get_dlg_reg' [-Werror,-Wframe-larger-than]
> x86_64-alpine.log:drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_rq_dlg_calc_31.c:1676:6: error: stack frame size (1336) exceeds limit (1280) in function 'dml31_rq_dlg_get_dlg_reg' [-Werror,-Wframe-larger-than]
> x86_64-alpine.log:drivers/vhost/scsi.c:1831:12: error: stack frame size (1320) exceeds limit (1280) in function 'vhost_scsi_release' [-Werror,-Wframe-larger-than]
>
> Another instance where distros lower CONFIG_FRAME_WARN below the 2048
> default. Again, none look particularly scary but should still probably
> be dealt with.

I would argue that they are still scary and should be addressed in the
code, it's just that
we don't see them on build bots that use the 2048 byte default.

Arnd