Re: [PATCH kcsan 27/32] kcsan: Add option to allow watcher interruptions

From: Paul E. McKenney
Date: Thu Mar 12 2020 - 14:03:30 EST


On Mon, Mar 09, 2020 at 12:04:15PM -0700, paulmck@xxxxxxxxxx wrote:
> From: Marco Elver <elver@xxxxxxxxxx>
>
> Add option to allow interrupts while a watchpoint is set up. This can be
> enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot
> parameter 'kcsan.interrupt_watcher=1'.
>
> Note that, currently not all safe per-CPU access primitives and patterns
> are accounted for, which could result in false positives. For example,
> asm-generic/percpu.h uses plain operations, which by default are
> instrumented. On interrupts and subsequent accesses to the same
> variable, KCSAN would currently report a data race with this option.
>
> Therefore, this option should currently remain disabled by default, but
> may be enabled for specific test scenarios.
>
> To avoid new warnings, changes all uses of smp_processor_id() to use the
> raw version (as already done in kcsan_found_watchpoint()). The exact SMP
> processor id is for informational purposes in the report, and
> correctness is not affected.
>
> Signed-off-by: Marco Elver <elver@xxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

And I get silent hangs that bisect to this patch when running the
following rcutorture command, run in the kernel source tree on a
12-hardware-thread laptop:

bash tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 10 --kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y" --configs TREE03

It works fine on some (but not all) of the other rcutorture test
scenarios. It fails on TREE01, TREE02, TREE03, TREE09. The common thread
is that these are the TREE scenarios are all PREEMPT=y. So are RUDE01,
SRCU-P, TASKS01, and TASKS03, but these scenarios are not hammering
on Tree RCU, and thus have far less interrupt activity and the like.
Given that it is an interrupt-related feature being added by this commit,
this seems like expected (mis)behavior.

Can you reproduce this? If not, are there any diagnostics I can add to
my testing? Or a diagnostic patch I could apply?

Thanx, Paul