Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel
Date: Mon Mar 03 2014 - 15:45:27 EST


On Sun, 2014-03-02 at 04:05 -0600, Peter Sewell wrote:
> On 1 March 2014 08:03, Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Sat, Mar 01, 2014 at 04:06:34AM -0600, Peter Sewell wrote:
> >> Hi Paul,
> >>
> >> On 28 February 2014 18:50, Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >> > On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
> >> >> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
> >> >> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
> >> >> > <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >> >> > >
> >> >> > > 3. The comparison was against another RCU-protected pointer,
> >> >> > > where that other pointer was properly fetched using one
> >> >> > > of the RCU primitives. Here it doesn't matter which pointer
> >> >> > > you use. At least as long as the rcu_assign_pointer() for
> >> >> > > that other pointer happened after the last update to the
> >> >> > > pointed-to structure.
> >> >> > >
> >> >> > > I am a bit nervous about #3. Any thoughts on it?
> >> >> >
> >> >> > I think that it might be worth pointing out as an example, and saying
> >> >> > that code like
> >> >> >
> >> >> > p = atomic_read(consume);
> >> >> > X;
> >> >> > q = atomic_read(consume);
> >> >> > Y;
> >> >> > if (p == q)
> >> >> > data = p->val;
> >> >> >
> >> >> > then the access of "p->val" is constrained to be data-dependent on
> >> >> > *either* p or q, but you can't really tell which, since the compiler
> >> >> > can decide that the values are interchangeable.
> >> >> >
> >> >> > I cannot for the life of me come up with a situation where this would
> >> >> > matter, though. If "X" contains a fence, then that fence will be a
> >> >> > stronger ordering than anything the consume through "p" would
> >> >> > guarantee anyway. And if "X" does *not* contain a fence, then the
> >> >> > atomic reads of p and q are unordered *anyway*, so then whether the
> >> >> > ordering to the access through "p" is through p or q is kind of
> >> >> > irrelevant. No?
> >> >>
> >> >> I can make a contrived litmus test for it, but you are right, the only
> >> >> time you can see it happen is when X has no barriers, in which case
> >> >> you don't have any ordering anyway -- both the compiler and the CPU can
> >> >> reorder the loads into p and q, and the read from p->val can, as you say,
> >> >> come from either pointer.
> >> >>
> >> >> For whatever it is worth, hear is the litmus test:
> >> >>
> >> >> T1: p = kmalloc(...);
> >> >> if (p == NULL)
> >> >> deal_with_it();
> >> >> p->a = 42; /* Each field in its own cache line. */
> >> >> p->b = 43;
> >> >> p->c = 44;
> >> >> atomic_store_explicit(&gp1, p, memory_order_release);
> >> >> p->b = 143;
> >> >> p->c = 144;
> >> >> atomic_store_explicit(&gp2, p, memory_order_release);
> >> >>
> >> >> T2: p = atomic_load_explicit(&gp2, memory_order_consume);
> >> >> r1 = p->b; /* Guaranteed to get 143. */
> >> >> q = atomic_load_explicit(&gp1, memory_order_consume);
> >> >> if (p == q) {
> >> >> /* The compiler decides that q->c is same as p->c. */
> >> >> r2 = p->c; /* Could get 44 on weakly order system. */
> >> >> }
> >> >>
> >> >> The loads from gp1 and gp2 are, as you say, unordered, so you get what
> >> >> you get.
> >> >>
> >> >> And publishing a structure via one RCU-protected pointer, updating it,
> >> >> then publishing it via another pointer seems to me to be asking for
> >> >> trouble anyway. If you really want to do something like that and still
> >> >> see consistency across all the fields in the structure, please put a lock
> >> >> in the structure and use it to guard updates and accesses to those fields.
> >> >
> >> > And here is a patch documenting the restrictions for the current Linux
> >> > kernel. The rules change a bit due to rcu_dereference() acting a bit
> >> > differently than atomic_load_explicit(&p, memory_order_consume).
> >> >
> >> > Thoughts?
> >>
> >> That might serve as informal documentation for linux kernel
> >> programmers about the bounds on the optimisations that you expect
> >> compilers to do for common-case RCU code - and I guess that's what you
> >> intend it to be for. But I don't see how one can make it precise
> >> enough to serve as a language definition, so that compiler people
> >> could confidently say "yes, we respect that", which I guess is what
> >> you really need. As a useful criterion, we should aim for something
> >> precise enough that in a verified-compiler context you can
> >> mathematically prove that the compiler will satisfy it (even though
> >> that won't happen anytime soon for GCC), and that analysis tool
> >> authors can actually know what they're working with. All this stuff
> >> about "you should avoid cancellation", and "avoid masking with just a
> >> small number of bits" is just too vague.
> >
> > Understood, and yes, this is intended to document current compiler
> > behavior for the Linux kernel community. It would not make sense to show
> > it to the C11 or C++11 communities, except perhaps as an informational
> > piece on current practice.
> >
> >> The basic problem is that the compiler may be doing sophisticated
> >> reasoning with a bunch of non-local knowledge that it's deduced from
> >> the code, neither of which are well-understood, and here we have to
> >> identify some envelope, expressive enough for RCU idioms, in which
> >> that reasoning doesn't allow data/address dependencies to be removed
> >> (and hence the hardware guarantee about them will be maintained at the
> >> source level).
> >>
> >> The C11 syntactic notion of dependency, whatever its faults, was at
> >> least precise, could be reasoned about locally (just looking at the
> >> syntactic code in question), and did do that. The fact that current
> >> compilers do optimisations that remove dependencies and will likely
> >> have many bugs at present is besides the point - this was surely
> >> intended as a *new* constraint on what they are allowed to do. The
> >> interesting question is really whether the compiler writers think that
> >> they *could* implement it in a reasonable way - I'd like to hear
> >> Torvald and his colleagues' opinion on that.
> >>
> >> What you're doing above seems to be basically a very cut-down version
> >> of that, but with a fuzzy boundary. If you want it to be precise,
> >> maybe it needs to be much simpler (which might force you into ruling
> >> out some current code idioms).
> >
> > I hope that Torvald Riegel's proposal (https://lkml.org/lkml/2014/2/27/806)
> > can be developed to serve this purpose.
>
> (I missed that mail when it first came past, sorry)
>
> That's also going to be tricky, I'm afraid. The key condition there is:
>
> "* at the time of execution of E, L [PS: I assume that L is a
> typo and should be E]

No, L was intended. (But see below.)

> can possibly have returned at
> least two different values under the assumption that L itself
> could have returned any value allowed by L's type."
>
> First, the evaluation of E might be nondeterministic - e.g., for an
> artificial example, if it's just a nondeterministic value obtained
> from the result of a race on SC atomics. The above doesn't
> distinguish between that (which doesn't have a real dependency on L)
> and that XOR'd with L (which does). And it does so in the wrong
> direction: it'll say there the former has a dependency on L.

I'm not quite sure I understand the examples you want to point out
(could you add brief code snippets, perhaps?) -- but the informal
definition I proposed also says that E must have used L in some way to
make it's computation. That's pretty vague, and the way I phrased the
requirement is probably not optimal.

So let me expand a bit on the background first. What I tried to capture
with the rules is that an evaluation (ie, E), really uses the value
returned by L, and not just a, for example, constant value that can be
inferred from whatever was executed between L and E. This (is intended
to) prevent value prediction and such by programmers. The compiler in
turn knows what a real program is allowed to do that still wants to rely
on the ordering by the consume. Basing all of this on the values seemed
to be helpful because basing it on *purely* syntax didn't seem
implementable; for the latter, we'd need to disallow cases such as x-x
or require compilers to include artificial dependencies if a program
indeed does value speculation. IOW, it didn't seem possible to draw a
clear distinction between a data dependency defined purely based on
syntax and control dependencies.

Another way to perhaps phrase the requirement might be to construct it
inductively, so that "E uses the value returned by L" becomes clearer.
However, I currently don't really know how to do that without any holes.

For example, let's say that an atomic mo_consume load has a value
dependency to itself; the dependency has an associated set of values,
which is equal to all values allowed by the type (but see also below).
Then, an evaluation A that uses B as operand has a value dependency on B
if the associated set of values can still have more than one element.
Whether that's the case depends on the operation, and the other
operands, including their values at the time of execution.
However, it's not just results of evaluations that effectively constrain
the set of values. Conditional execution does as well, because the
condition being true might establish a constraint on L, which might
remove any dependency when L (or a result of a computation involving L)
is used. So I'm not sure the inductive approach would work.

You said you thought I might have wanted to say that:
"E really-depends on L if there exist two different values that (just
according to typing) might be read for L that give rise to two
different values for E".

Which should follow from the above, or at least should not conflict with
it. My "definition" tried to capture that the program must not
establish so many constraints between E and L that the value of L is
clear in the sense of having exactly one value. If we dereference such
an E whose execution in fact constrains L to one value, there is no real
dependency anymore.
However, by itself, this doesn't cover that E must use L in it's
computation -- which your formulation does, I think.

Another way might be to try to define which constraints established by
an execution (based on the executed code's semantics) actually remove
value dependencies on L.

Do you have any suggestions for how to define this (ie, true value
dependencies) in a better way?

> Second, it involves reasoning about counterfactual executions. That
> doesn't necessarily make it wrong, per se, but probably makes it hard
> to work with. For example, suppose that in all the actual
> whole-program executions, a runtime occurrence of L only ever returns
> one particular value (perhaps because of some simple #define'd
> configuration)

Right, or just because it's a deterministic program.

This is a problem for the definition of value dependencies, AFAICT,
because where do you put the line for what the compiler is allowed to
speculate about? When does the program indeed "reveal" the value of a
load, and when does it not? Is a compiler to do out-of-band tracking
for certain memory locations, and thus predict the value?

Adding the assumption that L, at least conceptually, might return any
value seemed to be a simple way to remove that uncertainty regarding
what the compiler might know about.

> , and that the code used in the evaluation of E depends
> on some invariant which is related to that configuration.

A related issue that I've been thinking about is whether things like
divide-by-zero (ie, operations that give rise to undefined behavior)
should be considered as constraints on the values or not. I guess they
should, but then this seems closer what you mention, invariants
established by the whole program.

> The
> hypothetical execution used above in which a different value is used
> is one in the code is being run in a situation with broken invariants.
> Then there will be technical difficulties in using the definition:
> I don't see how one would persuade oneself that a compiler always
> satisfies it, because these hypothetical executions are far removed
> from what it's actually working on.

I think the above is easy to implement for a compiler. At the
mo_consume load, you simply forget any information you have about this
value / memory location; for operations on value_dep_preserving types,
you ignore any out-of-band information that might originate from code
run before the load (IOW, you take a fresh start on each load in terms
of analysis info). Having the requirement that the program indeed needs
to have a value dependency should allow the compiler to run normal
optimizations on the code.

I'd really appreciate any feedback or alternative suggestions for how to
formulate the requirements more precisely. I think many of us kind of
agree about the general approach we'd like to try, but specifying this
precisely is another step.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/