Re: [RFC PATCHC 3/3] sched/fair: use the idle state info to choose the idlest cpu

From: Nicolas Pitre
Date: Thu Apr 17 2014 - 12:21:43 EST


On Thu, 17 Apr 2014, Daniel Lezcano wrote:

> On 04/17/2014 05:53 PM, Nicolas Pitre wrote:
> > On Thu, 17 Apr 2014, Daniel Lezcano wrote:
> >
> > > Ok, refreshed the patchset but before sending it out I would to discuss
> > > about
> > > the rational of the changes and the policy, and change the patchset
> > > consequently.
> > >
> > > What order to choose if the cpu is idle ?
> > >
> > > Let's assume all cpus are idle on a dual socket quad core.
> > >
> > > Also, we can reasonably do the hypothesis if the cluster is in low power
> > > mode,
> > > the cpus belonging to the same cluster are in the same idle state (putting
> > > apart the auto-promote where we don't have control on).
> > >
> > > If the policy you talk above is 'aggressive power saving', we can follow
> > > the
> > > rules with decreasing priority:
> > >
> > > 1. We want to prevent to wakeup the entire cluster
> > > => as the cpus are in the same idle state, by choosing a cpu in
> > > => shallow
> > > state, we should have the guarantee we won't wakeup a cluster (except if
> > > no
> > > shallowest idle cpu are found).
> >
> > This is unclear to me. Obviously, if an entire cluster is down, that
> > means all the CPUs it contains have been idle for a long time. And
> > therefore they shouldn't be subject to selection unless there is no
> > other CPUs available. Is that what you mean?
>
> Yes, this is what I meant. But also what I meant is we can get rid for the
> moment of the cpu topology and the coupling idle state because if we do this
> described approach, as the idle state will be the same for the cpus belonging
> to the same cluster we won't select a cluster down (except if there is no
> other CPUs available).

CPU topology is needed to properly describe scheduling domains. Whether
we balance across domains or pack using as few domains as possible is a
separate issue. In other words, you shouldn't have to care in this
patch series.

And IMHO coupled C-state is a low-level mechanism that should remain
private to cpuidle which the scheduler shouldn't be aware of.

> > > 2. We want to prevent to wakeup a cpu which did not reach the target
> > > residency
> > > time (will need some work to unify cpuidle idle time and idle task run
> > > time)
> > > => with the target residency and, as a first step, with the idle
> > > => stamp,
> > > we can determine if the cpu slept enough
> >
> > Agreed. However, right now, the scheduler does not have any
> > consideration for that. So this should be done as a separate patch.
>
> Yes, I thought as a very first step we can rely on the idle stamp until we
> unify the times with a big comment. Or I can first unify the idle times and
> then take into account the target residency. It is to comply with Rafael's
> request to have the 'big picture'.

I agree, but that should be done incrementally. Even without this
consideration, what you proposed is already an improvement over the
current state of affairs.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/