Re: [PATCH] fix rcu vs hotplug race

From: Ingo Molnar
Date: Tue Jun 24 2008 - 07:02:22 EST



* Gautham R Shenoy <ego@xxxxxxxxxx> wrote:

> > hm, not sure - we might just be fighting the symptom and we might
> > now create a silent resource leak instead. Isnt a full RCU quiescent
> > state forced (on all CPUs) before a CPU is cleared out of
> > cpu_online_map? That way the to-be-offlined CPU should never
> > actually show up in rcp->cpumask.
>
> No, this does not happen currently. The rcp->cpumask is always
> initialized to cpu_online_map&~nohz_cpu_mask when we start a new
> batch. Hence, before the batch ends, if a cpu goes offline we _can_
> have a stale rcp->cpumask, till the RCU subsystem has handled it's
> CPU_DEAD notification.
>
> Thus for a tiny interval, the rcp->cpumask would contain the offlined
> CPU. One of the alternatives is probably to handle this using
> CPU_DYING notifier instead of CPU_DEAD where we can call
> __rcu_offline_cpu().
>
> The warn_on that dhaval was hitting was because of some cpu-offline
> that was called just before we did a local_irq_save inside call_rcu().
> But at that time, the rcp->cpumask was still stale, and hence we ended
> up sending a smp_reschedule() to an offlined cpu. So the check may not
> create any resource leak.

the check may not - but the problem it highlights might and with the
patch we'd end up hiding potential problems in this area.

Paul, what do you think about this mixed CPU hotplug plus RCU workload?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/