Re: [PATCH 0/7] ARM: hacks for link-time optimization

From: Peter Zijlstra
Date: Tue Dec 18 2018 - 05:00:36 EST


On Tue, Dec 18, 2018 at 10:18:24AM +0100, Peter Zijlstra wrote:
> In particular turning an address-dependency into a control-dependency,
> which is something allowed by the C language, since it doesn't recognise
> these concepts as such.
>
> The 'optimization' is allowed currently, but LTO will make it much more
> likely since it will have a much wider view of things. Esp. when combined
> with PGO.
>
> Specifically; if you have something like:
>
> int idx;
> struct object objs[2];
>
> the statement:
>
> val = objs[idx & 1].ponies;
>
> which you 'need' to be translated like:
>
> struct object *obj = objs;
> obj += (idx & 1);
> val = obj->ponies;
>
> Such that the load of obj->ponies depends on the load of idx. However
> our dear compiler is allowed to make it:
>
> if (idx & 1)
> obj = &objs[1];
> else
> obj = &objs[0];
>
> val = obj->ponies;
>
> Because C doesn't recognise this as being different. However this is
> utterly broken, because in this translation we can speculate the load
> of obj->ponies such that it no longer depends on the load of idx, which
> breaks RCU.
>
> Note that further 'optimization' is possible and the compiler could even
> make it:
>
> if (idx & 1)
> val = objs[1].ponies;
> else
> val = objs[0].ponies;

A variant that is actually broken on x86 too (due to issuing the loads
in the 'wrong' order):

val = objs[0].ponies;
if (idx & 1)
val = objs[1].ponies;

Which is a translation that makes sense if we either marked
unlikely(idx & 1) or if PGO found the same.

> Now, granted, this is a fairly artificial example, but it does
> illustrate the exact problem.
>
> The more the compiler can see of the complete program, the more likely
> it can make inferrences like this, esp. when coupled with PGO.
>
> Now, we're (usually) very careful to wrap things in READ_ONCE() and
> rcu_dereference() and the like, which makes it harder on the compiler
> (because 'volatile' is special), but nothing really stops it from doing
> this.
>
> Paul has been trying to beat clue into the language people, but given
> he's been at it for 10 years now, and there's no resolution, I figure we
> ought to get compiler implementations to give us a knob.