Re: [PATCH] Correct nr_processes() when CPUs have been unplugged

From: Paul E. McKenney
Date: Wed Nov 04 2009 - 01:09:51 EST

On Tue, Nov 03, 2009 at 01:34:32PM -0500, Christoph Lameter wrote:
> On Tue, 3 Nov 2009, Ingo Molnar wrote:
> > Sidenote: percpu areas currently are kept allocated on x86.
> They must be kept allocated for all possible cpus. Arch code cannot decide
> to not allocate per cpu areas.
> Search for "for_each_possible_cpu" in the source tree if you want more
> detail.

Here are a few in my area:

kernel/rcutorture.c srcu_torture_stats 523 for_each_possible_cpu(cpu) {
kernel/rcutorture.c rcu_torture_printk 800 for_each_possible_cpu(cpu) {
kernel/rcutorture.c rcu_torture_init 1127 for_each_possible_cpu(cpu) {
kernel/rcutree.c RCU_DATA_PTR_INIT 1518 for_each_possible_cpu(i) { \
kernel/rcutree_trace.c PRINT_RCU_DATA 73 for_each_possible_cpu(_p_r_d_i) \
kernel/rcutree_trace.c print_rcu_pendings 237 for_each_possible_cpu(cpu) {
kernel/srcu.c srcu_readers_active_idx 64 for_each_possible_cpu(cpu)

> > That might change in the future though, especially with virtual systems
> > where the possible range of CPUs can be very high - without us
> > necessarily wanting to pay the percpu area allocation price for it. I.e.
> > dynamic deallocation of percpu areas is something that could happen in
> > the future.
> Could be good but would not be as easy as you may think since core code
> assumes that possible cpus have per cpu areas configured. There will be
> the need for additional notifiers and more complex locking if we want to
> have percpu areas for online cpus only. Per cpu areas are permanent at
> this point which is a good existence guarantee that avoids all sorts of
> complex scenarios.


I will pick on the last one. I would need to track all the srcu_struct
structures. Each such structure would need an additional counter.
In the CPU_DYING notifier, SRCU would need to traverse all srcu_struct
structures, zeroing the dying CPU's count and adding the old value
to the additional counter. This is safe because CPU_DYING happens in
stop_machine_run() context.

Then srcu_readers_active_idx() would need to initialize "sum" to the
additional counter rather than to zero.

Not all -that- bad, and similar strategies could likely be put in place
for the other six offenders in RCU.

Another class of problems would be from code that did not actually access
an offline CPU's per-CPU variables, but instead implicitly expected the
values to remain across an offline-online event pair. The various
rcu_data per-CPU structures would need some fixups when the CPU came
back online.

One way to approach this would be have two types of per-CPU variable,
one type with current semantics, and another type that can go away
when the corresponding CPU goes offline. This latter type probably
needs to be set back to the initial values when the corresponding
CPU comes back online.

Of course, given an "easy way out", one might expect most people to
opt for the old-style per-CPU variables. On the other hand, how
much work do we want to do to save (say) four bytes?

> > Nice one. I'm wondering why it was not discovered for such a long time.
> Cpu hotplug is rarely used (what you listed are rare and unusual cases)
> and therefore online cpus == possible cpus == present cpus.

Though it is not unusual for "possible cpus" to be quite a bit larger
than "online cpus"...

Thanx, Paul
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at