Re: KCSAN: data-race in __alloc_file / __alloc_file

From: Marco Elver
Date: Wed Nov 13 2019 - 16:33:49 EST


On Wed, 13 Nov 2019, Linus Torvalds wrote:

> On Wed, Nov 13, 2019 at 7:00 AM Marco Elver <elver@xxxxxxxxxx> wrote:
> >
> > Just to summarize the options we've had so far:
> > 1. Add a comment, and let the tool parse it somehow.
> > 2. Add attribute to variables.
> > 3. Add some new macro to use with expressions, which doesn't do
> > anything if the tool is disabled. E.g. "racy__(counter++)",
> > "lossy__(counter++);" or any suitable name.
>
> I guess I could live with "data_race(x)" or something simple like
> that, assuming we really can just surround a whole expression with it,
> and we don't have to make a hundred different versions for the
> different cases ("racy plain assignment" vs "racy statistics update"
> vs "racy u64 addition" etc etc).
>
> I just want the source code to be very legible, which is one of the
> problems with the ugly READ_ONCE() conversions.
>
> Part of that "legible source code" implies no crazy double
> underscores. But a plain "data_race(x)" might not look too bad, and
> would be easy to grep for, and doesn't seem to exist in the current
> kernel as anything else.
>
> One question is if it would be a statement expression or an actual
> expression. I think the expression would read much better, IOW you
> could do
>
> val = data_race(p->field);
>
> instead of having to write it as
>
> data_race(val = p->field);
>
> to really point out the race. But at the same time, maybe you need to
> surround several statements, ie
>
> // intentionally racy xchg because we don't care and it generates
> better code
> data_race(a = p->field; p->field = b);
>
> which all would work fine with a non-instrumented macro something like this:
>
> #define data_race(x) ({ x; })
>
> which would hopefully give the proper syntax rules.
>
> But that might be very very inconvenient for KCSAN, depending on how
> you annotate the thing.
>
> So I _suspect_ that what you actually want is to do it as a statement,
> not as an expression. What's the actual underlying syntax for "ignore
> this code for thread safety checking"?

An expression works fine. The below patch would work with KCSAN, and all
your above examples work.

Re name: would it make sense to more directly convey the intent? I.e.
"this expression can race, and it's fine that the result is approximate
if it does"?

My vote would go to something like 'smp_lossy' or 'lossy_race' -- but
don't have a strong preference, and would also be fine with 'data_race'.
Whatever is most legible. Comments?

Thanks,
-- Marco


diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 0b6506b9dd11..4d0597f89168 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -308,6 +308,29 @@ unsigned long read_word_at_a_time(const void *addr)
__u.__val; \
})

+#include <linux/kcsan.h>
+
+/*
+ * smp_lossy: macro that provides a way to document the intent that an
+ * expression may be lossy or approximate on an SMP system due to data races.
+ *
+ * This macro *does not* affect normal code generation, but is a hint to tooling
+ * that it is intentional that accesses in the expression may conflict with
+ * concurrent accesses.
+ */
+#ifdef __SANITIZE_THREAD__
+#define smp_lossy(expr) \
+ ({ \
+ typeof(({ expr; })) __val; \
+ kcsan_nestable_atomic_begin(); \
+ __val = ({ expr; }); \
+ kcsan_nestable_atomic_end(); \
+ __val; \
+ })
+#else
+#define smp_lossy(expr) ({ expr; })
+#endif
+
#endif /* __KERNEL__ */

/*