Re: [RFC][PATCH 2/2] sched: fair-group: per root-domain loadbalancing

From: Peter Zijlstra
Date: Fri Feb 15 2008 - 14:49:23 EST



On Fri, 2008-02-15 at 11:46 -0500, Gregory Haskins wrote:
> Peter Zijlstra wrote:
>
> > @@ -6342,8 +6351,14 @@ static void rq_attach_root(struct rq *rq
> > cpu_clear(rq->cpu, old_rd->span);
> > cpu_clear(rq->cpu, old_rd->online);
> >
> > - if (atomic_dec_and_test(&old_rd->refcount))
> > + if (atomic_dec_and_test(&old_rd->refcount)) {
> > + /*
> > + * sync with active timers.
> > + */
> > + active = lb_monitor_destroy(&old_rd->lb_monitor);
> > +
> > kfree(old_rd);
>
> Note that this works out to be a bug in my code on -rt since you cannot
> kfree() while the raw rq->lock is held. This isn't your problem, per
> se, but just a heads up that I might need to patch this area ASAP.

Yeah, I saw that patch, should be simple enough to fix.

> > +static int load_balance_shares(struct lb_monitor *lb_monitor)
> > +{
> > + int cpu = smp_processor_id();
> > + struct rq *rq = cpu_rq(cpu);
> > + struct root_domain *rd;
> > + struct sched_domain *sd, *sd_prev = NULL;
> > + int state = shares_idle;
> > +
> > + spin_lock(&rq->lock);
> > + /*
> > + * root_domain will stay valid until timer exits - synchronized by
> > + * hrtimer_cancel().
> > + */
> > + rd = rq->rd;
> > + spin_unlock(&rq->lock);
>
> I know we talked about this on IRC, I'm am still skeptical about this
> part of the code. Normally we only guarantee the stability of the
> rq->rd pointer while the rq->lock is held or a rd->refcount is added.

> It would be "safer" to bracket this code with an up/down sequence on the
> rd->refcount,

I initially did break out the up/down reference stuff. It has a few
other complications but all quite tractable.

> but perhaps you can convince me that it is not needed?
> (i.e. I am still not understanding how the timer guarantees the stability).

ok, let me try again.

So we take rq->lock, at this point we know rd is valid.
We also know the timer is active.

So when we release it, the last reference can be dropped and we end up
in the hrtimer_cancel(), right before the kfree().

hrtimer_cancel() will wait for the timer to end. therefore delaying the
kfree() until the running timer finished.

> [up-ref]
>
> > +
> > + /*
> > + * complicated way to find rd->load_balance
> > + */
> > + rcu_read_lock();
> > + for_each_domain(cpu, sd) {
> > + if (!(sd->flags & SD_LOAD_BALANCE))
> > + continue;
> > + sd_prev = sd;
> > + }
> > + if (sd_prev)
> > + state = max(state, rebalance_shares(rd, cpu));
> > + rcu_read_unlock();
> > +
>
> [down-ref]
>
> We would of course need to re-work the drop-ref code so it could be
> freed independent of the rq_attach_root() function, but that should be
> trivial.
>
> > + set_lb_monitor_timeout(lb_monitor, state);
> >
> > return state;
> > }



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/