Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offlinefrom atomic context

From: Srivatsa S. Bhat
Date: Mon Dec 10 2012 - 00:03:33 EST


On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
> Damn, sorry for noise. I missed this part...
>
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>>> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
>>
>> I hadn't considered that. Thinking of it, I don't think it would help us..
>> It won't get rid of the currently running preempt_disable() sections no?
>
> Sure. But (again, this is only my feeling so far) given that get_online_cpus_atomic()
> does cli/sti,

Ah, that one! Actually, the only reason I do that cli/sti is because, potentially
interrupt handlers can be hotplug readers too. So we need to protect the portion
of the code of get_online_cpus_atomic() which is not re-entrant.
(Which reminds me to try and reduce the length of cli/sti in that code, if possible).

> this can help to implement ensure-the-readers-must-see-the-pending-writer.
> IOW this might help to implement sync-with-readers.
>

2 problems:

1. It won't help with cases like this:

preempt_disable()
...
preempt_disable()
...
<------- Here
...
preempt_enable()
...
preempt_enable()

If the IPI hits at the point marked above, the IPI is useless, because, at
that point, since we are already in a nested read-side critical section, we can't
switch the synchronization protocol. We need to wait till we start a fresh
non-nested read-side critical section, in order to switch to global rwlock.
The reason is that preempt_enable() or put_online_cpus_atomic() can only undo
what its predecessor (preempt_disable()/get_online_cpus_atomic()) did.

2. Part of the reason we want to get rid of stop_machine() is to avoid the
latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
every CPU, _even when_ that CPU's existing preempt_disabled() sections might
not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

So, I think whatever synchronization scheme we develop, must not induce the very
same problems that stop_machine() had. Otherwise, we can end up going a full-circle
and coming back to the same stop_machine() scenario that we intended to get rid of.

(That's part of the reason why I initially tried to provide that _light() variant
of the reader APIs in the previous version of the patchset, so that light readers
can remain as undisturbed from cpu hotplug as possible, thereby avoiding indirectly
inducing the "stop_machine effect", like I mentioned here:
https://lkml.org/lkml/2012/12/5/369)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/