RE: Memory corruption due to word sharing

From: Boehm, Hans
Date: Wed Feb 01 2012 - 16:13:26 EST


> From: Linus Torvalds
> >
> > We need a proper memory model.
>
> Not really.
>
> The fact is, the kernel will happily take the memory model of the
> underlying hardware. Trying to impose some compiler description of the
> memory model is actually horribly bad, because it automatically also
> involves compiler synchronization points - which will try to be
> portable and easy to understand, and quite frankly, that will
> automatically result in what is technically known as a "shitload of
> crap".
The C11 memory model potentially adds overhead in only two cases:

1. When current code involves touching a field that wouldn't otherwise be touched. There are odd cases in which this measurably slows down code, but I think all agree that we need it. In addition to bitfields, it can affect speculatively promoting a value to a register in a loop, which at least older versions of gcc also do.

2. When you use atomic operations for racing accesses. And in that case, if you really want it, you get a lot of control over memory ordering. Atomic operations that specify "memory_order_relaxed" should only have three kinds of impact:
- They ensure the access is indivisible.
- They tell the compiler that the location may be concurrently modified and it should refrain from optimizations that will break if it is.
- They enforce cache coherence, i.e. single-variable ordering. This is implicit on most architectures, but requires ld.acq on Itanium. IIRC, Paul McKenney argued convincingly that this was essential for preserving programmer sanity.

>
> Now, a strict memory model is fine for portability, and to make it
> simple for programmers. I actually largely approve of the Java memory
> model approach, even if it has been horribly problematic and has some
> serious problems. But it's not something that is acceptable for the
> kernel - we absolutely do *not* want the compiler to introduce some
> memory model, because we are very much working together with whatever
> the hardware memory model is, and we do our own serialization
> primitives.
I think you are somewhat misinterpreting the C11 memory model. Aside from the minor cache coherence issues on one or two architectures, you can still do that, if you like. But I suspect you can also do better in places.

>
> > No vague assumptions with lots of hand-waving.
>
> So here's basically what the kernel needs:
>
> - if we don't touch a field, the compiler doesn't touch it.
>
> This is the rule that gcc now violates with bitfields.
And in other more subtle cases. At least until very recently.

>
> This is a gcc bug. End of story. The "volatile" example proves it -
> anybody who argues otherwise is simply wrong, and is just trying to
> make excuses.
Agreed.

>
> - if we need to touch something only once, or care about something
> that is touched only conditionally, and we actually need to make sure
> that it is touched *only* under that condition, we already go to quite
> extreme lengths to tell the compiler so.
>
> We usually use an access with a "volatile" cast on it or tricks
> like that. Or we do the whole "magic asm that says it modified memory
> to let the compiler know not to try anything clever"
>
> - The kernel IS NOT GOING TO MARK DATA STRUCTURES.
>
> Marking data structures is seriously stupid and wrong. It's not the
> *data* that has access rules, it's the *code* that has rules of
> access.
>
> The traditional C "volatile" is misdesigned and wrong. We don't
> generally mark data volatile, we really mark *code* volatile - which
> is why our "volatiles" are in the casts, not on the data structures.
>
> Stuff that is "volatile" in one context is not volatile in another. If
> you hold a RCU write lock, it may well be entirely stable, and marking
> it volatile is *wrong*, and generating code as if it was volatile is
> pure and utter shit.
>
> On the other hand, if you are touching *the*very*same* field while you
> are only read-locked for RCU, it may well be one of those "this has to
> be read by accessing it exactly once".
>
> And we do all this correctly in the kernel. Modulo bugs, of course,
> but the fundamental rule really is: "atomicity or volatility is about
> CODE, not DATA".
The preferred C11 model is clearly to mark data that potentially participates in unprotected concurrent accesses, and then to annotate accesses that require less care, e.g. because we know they can't occur concurrently with other accesses. Maybe this is the wrong approach for the kernel, but to me that seems like a safer approach. We did consider the alternative model, and tried to ensure that casts to atomic could also work on as many platforms as possible, though we can't guarantee that. (There is currently no specification for atomic accesses that says "not concurrently accessed". We've thought about it. I would argue for adding it if there were important cases in which memory_order_relaxed is demonstrably not close enough.)

>
> And no, C11 does *not* do it correctly. The whole "_Atomic" crap is
> exactly the same mistake as "volatile" was. It's wrong. Marking data
> _Atomic is a sure sign that whoever designed it didn't understand
> things.
>
> > The only candidate that I see is the C++11/C11 model.  Any other
> > suggestions?
>
> The C11 model addresses the wrong thing, and addresses it badly.
>
> You might as well ignore it as far as the kernel is concerned. I'm
> sure it helps some *other* problem, but it's not the kernel problem.
But it seems to me that the core of the C11 model consists largely of the rules you're asking for. Even if you continue to use volatile and ignore atomic, it seems to me that you're far ahead, though still on needlessly thin ice, with the C11 model.

>
> The rules really are very simple, and the kernel already does
> everything to make it easy for the compiler.
>
> When we do something that the compiler cannot re-order around, we do
> an asm() and mark it as modifying memory so that the compiler won't
> screw things up. In addition, we will do whatever that the CPU
> requires for memory ordering, and quite frankly, the compiler will
> never have sufficient locking primitives to satisfy us, and there is
> no real point in even trying. If you try to do locking in the
> compiler, you *will* do it wrong.
Which is one reason C11 gives you fine control over atomics that can be used to implement locks.

>
> If you add random flags on data structures ("_Atomic" or "volatile" or
> whatever), you *will* do it wrong. It's simply a fundamentally broken
> model.
>
> So the *only* memory model we want from the compiler really is: "don't
> write to fields that the source code didn't write to".
>
> It's really is that simple.
You clearly also need to say something about adjacent bitfields. Nor would it be usable outside the kernel, because the programmer needs to understand the esoteric fence semantics etc. on every architecture in order to really use it. (It also isn't exactly a language specification, but that's OK here.) But again I think the C11 spec implies exactly what you want here.

>
> > So, would it be okay to tell the compiler which part of the state is
> > accessed concurrently (ie, locks, atomics, ...)?
>
> Seriously, no.
>
> See above: it's not the "state" that is accessed concurrently. It's
> the code. If you ever try to mark state, you've already lost. The same
> "state" can be atomic or not depending on context. It's not about the
> state or the data structures, and it never will be.
I think we have no disagreement that in the end it's the access whose properties you need to specify. The question is what constitutes the most convenient and robust way to do so. Clearly we specify other attributes of accesses (length, alignment) using the type, and perhaps override it when necessary. C11 atomics arguably use the same approach.

Hans

>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64"
> in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/