Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel
Date: Thu Feb 06 2014 - 18:26:16 EST

Next message: Rafael J. Wysocki: "[GIT PULL] ACPI and power management fixes for v3.14-rc2"
Previous message: Andrew Morton: "Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no localmemory and limit readahead pages"
In reply to: Joseph S. Myers: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Next in thread: Joseph S. Myers: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2014-02-06 at 22:13 +0000, Joseph S. Myers wrote:
> On Thu, 6 Feb 2014, Torvald Riegel wrote:
>
> > > It seems that nobody really
> > > agrees on exactly how the C11 atomics map to real architectural
> > > instructions on anything but the trivial architectures.
> >
> > There's certainly different ways to implement the memory model and those
> > have to be specified elsewhere, but I don't see how this differs much
> > from other things specified in the ABI(s) for each architecture.
>
> It is not clear to me that there is any good consensus understanding of
> how to specify this in an ABI, or how, given an ABI, to determine whether
> an implementation is valid.
>
> For ABIs not considering atomics / concurrency, it's well understood, for
> example, that the ABI specifies observable conditions at function call
> boundaries ("these arguments are in these registers", "the stack pointer
> has this alignment", "on return from the function, the values of these
> registers are unchanged from the values they had on entry"). It may
> sometimes specify things at other times (e.g. presence or absence of a red
> zone - whether memory beyond the stack pointer may be overwritten on an
> interrupt). But if it gives a code sequence, it's clear this is just an
> example rather than a requirement for particular code to be generated -
> any code sequence suffices if it meets the observable conditions at the
> points where code generated by one implementation may pass control to code
> generated by another implementation.
>
> When atomics are involved, you no longer have a limited set of
> well-defined points where control may pass from code generated by one
> implementation to code generated by another - the code generated by the
> two may be running concurrently.

Agreed.

> We know of certain cases
> <http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html> where there are
> choices of the mapping of atomic operations to particular instructions.
> But I'm not sure there's much evidence that these are the only ABI issues
> arising from concurrency - that there aren't any other ways in which an
> implementation may transform code, consistent with the as-if rule of ISO
> C, that may run into incompatibilities of different choices.

I can't think of other incompatibilities with high likelihood --
provided we ignore consume memory order and the handling of dependencies
(see below). But I would doubt there is a high risk of such because (1)
the data race definition should hopefully not cause subtle
incompatibilities and (2) there is clear "hand-off point" from the
compiler to a specific instruction representing an atomic access. For
example, if we have a release store, and we agree on the instructions
used for that, then compilers will have to ensure happens-before for
anything before the release store; for example, as long as stores
sequenced-before the release store are performed, it doesn't matter in
which order that happens. Subsequently, an acquire-load somewhere else
can pick this sequence of events up just by using the agreed-upon
acquire-load; like with the stores, it can order subsequent loads in any
way it sees fit, including different optimizations. That's obviously
not a formal proof, though. But it seems likely to me, at least :)

I'm more concerned about consume and dependencies because as far as I
understand the standard's requirements, dependencies need to be tracked
across function calls. Thus, we might have several compilers involved
in that, and we can't just "condense" things to happens-before, but it's
instead how and that we keep dependencies intact. Because of that, I'm
wondering whether this is actually implementable practically. (See
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448#c10)

> And even if
> those are the only issues, it's not clear there are well-understood ways
> to define the mapping from the C11 memory model to the architecture's
> model, which provide a good way to reason about whether a particular
> choice of instructions is valid according to the mapping.

I think that if we have different options, there needs to be agreement
on which to choose across the compilers, at the very least. I don't
quite know how this looks like for GCC and LLVM, for example.

> > Are you familiar with the formalization of the C11/C++11 model by Batty
> > et al.?
> > http://www.cl.cam.ac.uk/~mjb220/popl085ap-sewell.pdf
> > http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
>
> These discuss, as well as the model itself, proving the validity of a
> particular choice of x86 instructions. I imagine that might be a starting
> point towards an understanding of how to describe the relevant issues in
> an ABI, and how to determine whether a choice of instructions is
> consistent with what an ABI says. But I don't get the impression this is
> yet at the level where people not doing research in the area can routinely
> read and write such ABIs and work out whether a code sequence is
> consistent with them.

It's certainly complicated stuff (IMHO). On the positive side, it's
just a few compilers, the kernel, etc. that have to deal with this, if
most programmers use the languages' memory model.

> (If an ABI says "use instruction X", then you can't use a more efficient
> X' added by a new version of the instruction set. But it can't
> necessarily be as loose as saying "use X and Y, or other instructions that
> achieve semantics when the other thread is using X or Y", because it might
> be the case that Y' interoperates with X, X' interoperates with Y, but X'
> and Y' don't interoperate with each other. I'd envisage something more
> like mapping not to instructions, but to concepts within the
> architecture's own memory model - but that requires the architecture's
> memory model to be as well defined, and probably formalized, as the C11
> one.)

Yes. The same group of researchers have also worked on formalizing the
Power model, and use this as base for a proof of the correctness of the
proposed mappings: http://www.cl.cam.ac.uk/~pes20/cppppc/

The formal approach to all this might be a complex task, but it is more
confidence-inspiring than making guesses about one standard's prose vs.
another specification's prose.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rafael J. Wysocki: "[GIT PULL] ACPI and power management fixes for v3.14-rc2"
Previous message: Andrew Morton: "Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no localmemory and limit readahead pages"
In reply to: Joseph S. Myers: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Next in thread: Joseph S. Myers: "Re: [RFC][PATCH 0/5] arch: atomic rework"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]