Re: [PATCH v5 3/7] sched: set initial value of runnable avg for newforked task

From: Morten Rasmussen
Date: Thu May 09 2013 - 06:55:20 EST


On Wed, May 08, 2013 at 01:00:34PM +0100, Paul Turner wrote:
> On Wed, May 8, 2013 at 4:34 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Tue, May 07, 2013 at 04:20:55AM -0700, Paul Turner wrote:
> >> Yes, 1024 was only intended as a starting point. We could also
> >> arbitrarily pick something larger, the key is that we pick
> >> _something_.
> >>
> >> If we wanted to be more exacting about it we could just give them a
> >> sched_slice() worth; this would have a few obvious "nice" properties
> >> (pun intended).
> >
> > Oh I see I misunderstood again :/ Its not about the effective load but weight
> > of the initial effective load wrt adjustment.
> >
> > Previous schedulers didn't have this aspect at all, so no experience from me
> > here. Paul would be the one, since he's ran longest with this stuff.
> >
> > That said, I would tend to keep it shorter rather than longer so that it would
> > adjust quicker to whatever it really wanted to be.
> >
> > Morten says the load is unstable specifically on loaded systems.
>
> Here, Morten was (I believe) referring to the stability at task startup.
>
> To be clear:
> Because we have such a small runnable period denominator at this point
> a single changed observation (for an equivalently behaving thread)
> could have a very large effect. e.g. fork/exec -- happen to take a
> major #pf, observe a "relatively" long initial block.
>
> By associating an initial period (along with our full load_contrib)
> here, we're making the denominator larger so that these effects are
> less pronounced; achieving better convergence towards what our load
> contribution should actually be.

This is exactly what I meant, thanks :)

For the workloads we are looking at we frequently see tasks that get
blocked for short amounts of time shortly after the task was created. As
you already explained, the small denominator causes the tracked load
change very quickly until the denominator gets larger.

I think it makes good sense to initialize the period and sum (to be
conservative) to some appropriate value to get more a more stable
tracked load for new tasks.

Morten

>
> Also: We do this conservatively, by converging down, not up.
>
> > I would think
> > this is because we'd experience scheduling latency, we're runnable more pushing
> > things up. But if we're really an idle task at heart we'd not run again for a
> > long while, pushing things down again.
>
> Exactly, this is why we must be careful to use instaneous weights
> about wake-up decisions. Interactive and background tasks are largely
> idle.
>
> While this is exactly how we want them to be perceived from a
> load-balance perspective it's important to keep in mind that while
> wake-up placement has a very important role in the overall balance of
> a system, it is not playing quite the same game as the load-balancer.
>
> >
> > So on that point Paul's suggestion of maybe starting with __sched_slice() might
> > make sense because it increases the weight of the initial avg with nr_running.
> > Not sure really, we'll have to play and see what works best for a number of
> > workloads.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/