Re: [PATCH v4] arm64: Enable KCSAN

From: Kefeng Wang
Date: Tue Dec 14 2021 - 23:08:10 EST



On 2021/12/15 2:24, Mark Rutland wrote:
On Sat, Dec 11, 2021 at 09:17:34PM +0800, Kefeng Wang wrote:
This patch enables KCSAN for arm64, with updates to build rules
to not use KCSAN for several incompatible compilation units.

Recent GCC version(at least GCC10) made outline-atomics as the
default option(unlike Clang), which will cause linker errors
for kernel/kcsan/core.o. Disables the out-of-line atomics by
no-outline-atomics to fix the linker errors.

Meanwhile, as Mark said[1], some latent issues are needed to be
fixed which isn't just a KCSAN problem, we make the KCSAN depends
on EXPERT for now.

Tested selftest and kcsan_test(built with GCC11 and Clang 13),
and all passed.

[1] https://lkml.kernel.org/r/YadiUPpJ0gADbiHQ@FVFF77S0Q05N
Acked-by: Marco Elver <elver@xxxxxxxxxx> # kernel/kcsan
Tested-by: Joey Gouly <joey.gouly@xxxxxxx>
Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>
---

Tested on Qemu with clang 13 / gcc 11, based on 5.16-rc3 by Kefeng.
Tested with gcc 11 and clang 14.0 (built from git) on qemu and FVP by Joey.
Hi,

Sorry for the silence on v3; I got a little busy elsewhere and wasn't able to
test that, but I have given this v4 a fairly intensive test with KCSAN combined
with other config options I use when fuzzing.

I eyeballed the output from a bunch of compilers (noted below), and there's
still a fair amount of potentially-unsound instrumentation, but I'm happy with
that so long as we have the EXPERT dependency. I agree we can fix those
(latent) issues with follow-up work.

I have one minor comment below about adding a comment, but aside from that,
this looks good and I didn't encounter any unexpected issues while testing, so:

Reviewed-by: Mark Rutland <mark.rutland@xxxxxxx>
Tested-by: Mark Rutland <mark.rutland@xxxxxxx>
Many thanks for your , Mark.
v4:
- drop Clang version as commit 8cdd23c23c3d ("arm64: Restrict ARM64_BTI_KERNEL
to clang 12.0.0 and newer"), suggested by Nathan Chancellor
v3:
- add EXPERT and CLANG_VERSION depends suggested by Mark Rutland
v2:
- tested on GCC11 and disable outline-atomics for kernel/kcsan/core.c
suggested by Marco Elver

arch/arm64/Kconfig | 1 +
arch/arm64/kernel/vdso/Makefile | 1 +
arch/arm64/kvm/hyp/nvhe/Makefile | 1 +
kernel/kcsan/Makefile | 1 +
4 files changed, 4 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4ff73299f8a9..2cc9dea55e00 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -150,6 +150,7 @@ config ARM64
select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
select HAVE_ARCH_KASAN_HW_TAGS if (HAVE_ARCH_KASAN && ARM64_MTE)
+ select HAVE_ARCH_KCSAN if EXPERT
It might be worth adding:

/* Some instrumentation may be unsound */

... immediately above this to clarify why we added the EXPERT dependency here.

Other than that trivial comment, this looks good to me!

For future reference, below is an info dump about how I've tested this, and
which latent issues KCSAN reported during testing. Feel free to ignore for now.
:)

Our tester wants  this feature to test our kernel, this is  a good start on ARM64,

more people could be getting involved to enhance it, We could fix the KCSAN

Warning one by one :)


... the summary of which is below:

* BUG: KCSAN: data-race in capable / cgroup_freezer_migrate_task
* BUG: KCSAN: data-race in console_unlock / vprintk_emit
* BUG: KCSAN: data-race in __d_add / __d_add
* BUG: KCSAN: data-race in __do_sys_prctl / do_task_stat
* BUG: KCSAN: data-race in do_task_stat / sigprocmask
* BUG: KCSAN: data-race in ep_modify / ep_poll_callback
* BUG: KCSAN: data-race in _find_next_bit+0x60/0x194
* BUG: KCSAN: data-race in folio_mark_accessed / workingset_activation
* BUG: KCSAN: data-race in ktime_get / timekeeping_advance
* BUG: KCSAN: data-race in more_used_split+0x38/0x60
* BUG: KCSAN: data-race in mutex_spin_on_owner+0xcc/0x150
* BUG: KCSAN: data-race in rwsem_spin_on_owner+0xa8/0x13c
* BUG: KCSAN: data-race in start_dir_add+0x34/0xd4
* BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_idle_stop_tick
* BUG: KCSAN: data-race in timekeeping_delta_to_ns+0x34/0x94
* BUG: KCSAN: data-race in virtqueue_get_buf_ctx+0xe8/0x458
* BUG: KCSAN: data-race in virtqueue_get_buf_ctx_split+0x68/0x2dc
* BUG: KCSAN: data-race in vprintk_emit+0x320/0x494

By far the most commonly reported races were in "{mutex,rwsem}_spin_on_owner",
followed by "timekeeping_delta_to_ns" followed by "virtqueue_get_buf_ctx*". So
those look like the ones to focus on in the near term.

When using GCC I would very often see reports I didn't see with LLVM, for
"vprintk_emit" and "ktime_get / timekeeping_advance". I don't know if that's
down to something getting optimized away or something getting instrumented
differently.