Re: [RFC PATCH v7 1/7] Restartable sequences system call

From: Andy Lutomirski
Date: Wed Aug 10 2016 - 15:31:28 EST


On Wed, Aug 3, 2016 at 10:03 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Wed, Aug 3, 2016 at 9:27 PM, Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
>> On Wed, Aug 03, 2016 at 09:37:57AM -0700, Andy Lutomirski wrote:
>>> On Wed, Aug 3, 2016 at 5:27 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>> > On Tue, Jul 26, 2016 at 03:02:19AM +0000, Mathieu Desnoyers wrote:
>>> >> We really care about preemption here. Every migration implies a
>>> >> preemption from a user-space perspective. If we would only care
>>> >> about keeping the CPU id up-to-date, hooking into migration would be
>>> >> enough. But since we want atomicity guarantees for restartable
>>> >> sequences, we need to hook into preemption.
>>> >
>>> >> It allows user-space to perform update operations on per-cpu data without
>>> >> requiring heavy-weight atomic operations.
>>> >
>>> > Well, a CMPXCHG without LOCK prefix isn't all that expensive on x86.
>>> >
>>> > It is however on PPC and possibly other architectures, so in name of
>>> > simplicity supporting only the one variant makes sense.
>>> >
>>>
>>> I wouldn't want to depend on CMPXCHG. But imagine we had primitives
>>> that were narrower than the full abort-on-preemption primitive.
>>> Specifically, suppose we had abort if (actual cpu != expected_cpu ||
>>> *aptr != aval). We could do things like:
>>>
>>> expected_cpu = cpu;
>>> aval = NULL; // disarm for now
>>> begin();
>>> aval = event_count[cpu] + 1;
>>> event_count[cpu] = aval;
>>> event_count[cpu]++;
>>
>> This line is redundant, right? Because it will guarantee a failure even
>> in no-contention cases.
>>
>>>
>>> ... compute something ...
>>>
>>> // arm the rest of it
>>> aptr = &event_count[cpu];
>>> if (*aptr != aval)
>>> goto fail;
>>>
>>> *thing_im_writing = value_i_computed;
>>> end();
>>>
>>> The idea here is that we don't rely on the scheduler to increment the
>>> event count at all, which means that we get to determine the scope of
>>> what kinds of access conflicts we care about ourselves.
>>>
>>
>> If we increase the event count in userspace, how could we prevent two
>> userspace threads from racing on the event_count[cpu] field? For
>> example:
>>
>> CPU 0
>> ================
>> {event_count[0] is initially 0}
>>
>> [Thread 1]
>> begin();
>> aval = event_count[cpu] + 1; // 1
>>
>> (preempted)
>> [Thread 2]
>> begin();
>> aval = event_count[cpu] + 1; // 1, too
>> event_count[cpu] = aval; // event_count[0] is 1
>>
>
> You're right :( This would work with an xadd instruction, but that's
> very slow and doesn't exist on most architectures. It could also work
> if we did:

Thinking about this slightly more, maybe it does work. We could use
basically the same mechanism to allow the kernel to restart if the
specific sequence:

aval = event_count[cpu] + 1
event_count[cpu] = avall

gets preempted by setting aptr = &event_count[cpu] and aval to
event_count[cpu], like this (although I might have screwed up any
number of small details):

aptr = &event_count[cpu];
barrier();
aval = event_count[cpu];
barrier();
tmp = aval + 1;
event_count[cpu] = tmp;
/* preemption here will cause an unnecessary retry, but that's okay */
aval = tmp;

--Andy