Re: KCSAN: data-race in __alloc_file / __alloc_file

From: Marco Elver
Date: Fri Nov 08 2019 - 13:16:12 EST


On Fri, 8 Nov 2019 at 19:05, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 8, 2019 at 9:53 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > I personally like WRITE_ONCE() since it adds zero overhead on generated code,
> > and is the facto accessor we used for many years (before KCSAN was conceived)
>
> So I generally prefer WRITE_ONCE() over adding "volatile" to random
> data structure members.
>
> Because volatile *does* have potentially absolutely horrendous
> overhead on generated code. It just happens to be ok for the simple
> case of writing once to a variable.
>
> In fact, you bring that up yourself in your next email when you ask
> for "ADD_ONCE()". Exactly because gcc generates absolutely horrendous
> garbage for volatiles, for no actual good reason. Gcc *could* generate
> a single add-to-memory instruction. But no, that's not at all what gcc
> does.
>
> So for the kernel, we've generally had the rule to avoid 'volatile'
> data structures as much as humanly possible, because it actually does
> something much worse than it could do, and the source code _looks_
> simple when the volatile is hidden in the data structures.
>
> Which is why we have READ_ONCE/WRITE_ONCE - it puts the volatile in
> the code, and makes it clear not only what is going on, but also the
> impact it has on code generation.
>
> But at the same time, I don't love WRITE_ONCE() when it's not actually
> about writing once. It might be better to have another way to show
> "this variable is a flag that we set to a single value". Even if maybe
> the implementation is then the same (ie we use a 'volatile' assignment
> to make KCSAN happy).

(+some LKMM folks, in case I missed something on what the LKMM defines
as data race.)

KCSAN does not use volatile to distinguish accesses. Right now
READ_ONCE, WRITE_ONCE, atomic bitops, atomic_t (+ some arch specific
primitives) are treated as marked atomic operations.

The goal is to cover all primitives that the LKMM declares as
marked/atomic. A data race is then detected for concurrent conflicting
accesses where at least one is plain unmarked. In the end the LKMM
should decide what KCSAN determines as a data race. As far as I can
tell, none of the reported data races so far are false positives in
that sense.

Many thanks,
-- Marco