Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()

From: Oleg Nesterov
Date: Sun Sep 29 2013 - 10:04:20 EST

Next message: Charles Keepax: "Re: [RFC PATCH] mfd: arizona: Update device tree regulator bindings"
Previous message: Alex Williamson: "Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cachecoherency"
In reply to: Paul E. McKenney: "Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 09/27, Oleg Nesterov wrote:
>
> I tried hard to find any hole in this version but failed, I believe it
> is correct.

And I still believe it is. But now I am starting to think that we
don't need cpuhp_seq. (and imo cpuhp_waitcount, but this is minor).

> We need to ensure 2 things:
>
> 1. The reader should notic state = BLOCK or the writer should see
> inc(__cpuhp_refcount). This is guaranteed by 2 mb's in
> __get_online_cpus() and in cpu_hotplug_begin().
>
> We do not care if the writer misses some inc(__cpuhp_refcount)
> in per_cpu_sum(__cpuhp_refcount), that reader(s) should notice
> state = readers_block (and inc(cpuhp_seq) can't help anyway).

Yes!

> 2. If the writer sees the result of this_cpu_dec(__cpuhp_refcount)
> from __put_online_cpus() (note that the writer can miss the
> corresponding inc() if it was done on another CPU, so this dec()
> can lead to sum() == 0),

But this can't happen in this version? Somehow I forgot that
__get_online_cpus() does inc/get under preempt_disable(), always on
the same CPU. And thanks to mb's the writer should not miss the
reader which has already passed the "state != BLOCK" check.

To simplify the discussion, lets ignore the "readers_fast" state,
synchronize_sched() logic looks obviously correct. IOW, lets discuss
only the SLOW -> BLOCK transition.

cput_hotplug_begin()
{
state = BLOCK;

mb();

wait_event(cpuhp_writer,
per_cpu_sum(__cpuhp_refcount) == 0);
}

should work just fine? Ignoring all details, we have

get_online_cpus()
{
again:
preempt_disable();

__this_cpu_inc(__cpuhp_refcount);

mb();

if (state == BLOCK) {

mb();

__this_cpu_dec(__cpuhp_refcount);
wake_up_all(cpuhp_writer);

preempt_enable();
wait_event(state != BLOCK);
goto again;
}

preempt_enable();
}

It seems to me that these mb's guarantee all we need, no?

It looks really simple. The reader can only succed if it doesn't see
BLOCK, in this case per_cpu_sum() should see the change,

We have

WRITER READER on CPU X

state = BLOCK; __cpuhp_refcount[X]++;

mb(); mb();

...
count += __cpuhp_refcount[X]; if (state != BLOCK)
... return;

mb();
__cpuhp_refcount[X]--;

Either reader or writer should notice the STORE we care about.

If a reader can decrement __cpuhp_refcount, we have 2 cases:

1. It is the reader holding this lock. In this case we
can't miss the corresponding inc() done by this reader,
because this reader didn't see BLOCK in the past.

It is just the

A == B == 0
CPU_0 CPU_1
----- -----
A = 1; B = 1;
mb(); mb();
b = B; a = A;

pattern, at least one CPU should see 1 in its a/b.

2. It is the reader which tries to take this lock and
noticed state == BLOCK. We could miss the result of
its inc(), but we do not care, this reader is going
to block.

_If_ the reader could migrate between inc/dec, then
yes, we have a problem. Because that dec() could make
the result of per_cpu_sum() = 0. IOW, we could miss
inc() but notice dec(). But given that it does this
on the same CPU this is not possible.

So why do we need cpuhp_seq?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Charles Keepax: "Re: [RFC PATCH] mfd: arizona: Update device tree regulator bindings"
Previous message: Alex Williamson: "Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cachecoherency"
In reply to: Paul E. McKenney: "Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]