Re: [PATCH 1/3 v2] time: Fix clock->read(clock) race around clocksource changes

From: John Stultz
Date: Thu Jun 08 2017 - 15:02:34 EST


On Sun, Jun 4, 2017 at 11:52 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Wed, 31 May 2017, John Stultz wrote:
>>
>> The one exception where this helper isn't necessary is for the
>> fast-timekepers which use their own locking and update logic
>> to the tkr structures.
>
> That's simply wrong. The fast time keepers have exactly the same issue.
>
...
>
> So if you put the update in context:
>
> CPU0 CPU1
> rd = tkr->read;
> update_fast_timekeeper()
> write_seqcount_latch(tkr->seq);
> memcpy(tkr->base[0], newtkr);
> write_seqcount_latch(tkr->seq);
> memcpy(tkr->base[1], newtkr);
> cl = tkr->clock;
> now = rd(cl);
>
> Then you end up with the very same problem as with the general timekeeping
> itself.
>
> The two bases and the seqcount_latch() magic are there to allow using the
> fast timekeeper in NMI context, which can interrupt the update
> sequence. That guarantees that the reader which interrupted the update will
> always use a consistent tkr->base. But in no way does it protect against
> the read -> clock inconsistency caused by a concurrent or interrupting
> update.

Ah. I mistakenly thought the fast-timekeepers alternated on updates,
rather then both being updated at once.

Thanks for the clarification. I guess we'll need a fix there too.

>> + * tk_clock_read - atomic clocksource read() helper
>> + *
>> + * This helper is necessary to use in the read paths because, while the
>> + * seqlock ensures we don't return a bad value while structures are updated,
>> + * it doesn't protect from potential crashes. There is the possibility that
>> + * the tkr's clocksource may change between the read reference, and the
>> + * clock reference passed to the read function. This can cause crashes if
>> + * the wrong clocksource is passed to the wrong read function.
>
> Come on. The problem is not that it can cause crashes.
>
> The problem is that it hands in the wrong pointer. Even if it does not
> crash, it still can read from a location which has other way harder to
> debug side effects.
>
> Comments and changelogs should be written in a factual manner not like
> fairy tales.

Apologies, I've long been criticized for using the passive voice, and
I tend to hedge statements when I'm not totally confident I'm correct
(which with my batting avg, is most always). I'll try to improve this.

While I'm reworking this patch, if you have no objections to the other
two, are you open to queuing them up?

thanks
-john