Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

From: Yuyang Du
Date: Tue Sep 22 2015 - 03:28:20 EST


On Mon, Sep 21, 2015 at 10:30:04AM -0700, bsegall@xxxxxxxxxx wrote:
> > But first, I think as load_sum and load_avg can afford NICE_0_LOAD with either high
> > or low resolution. So we have no reason to have low resolution (10bits) load_avg
> > when NICE_0_LOAD has high resolution (20bits), because load_avg = runnable% * load,
> > as opposed to now we have load_avg = runnable% * scale_load_down(load).
> >
> > We get rid of all scale_load_down() for runnable load average?
>
> Hmm, LOAD_AVG_MAX * prio_to_weight[0] is 4237627662, ie barely within a
> 32-bit unsigned long, but in fact LOAD_AVG_MAX * MAX_SHARES is already
> going to give errors on 32-bit (even with the old code in fact). This
> should probably be fixed... somehow (dividing by 4 for load_sum on
> 32-bit would work, though be ugly. Reducing MAX_SHARES by 2 bits on
> 32-bit might have made sense but would be a weird difference between 32
> and 64, and could break userspace anyway, so it's presumably too late
> for that).
>
> 64-bit has ~30 bits free, so this would be fine so long as SLR is 0 on
> 32-bit.
>

load_avg has no LOAD_AVG_MAX term in it, it is runnable% * load, IOW, load_avg <= load.
So, on 32bit, cfs_rq's load_avg can host 2^32/prio_to_weight[0]/1024 = 47, with 20bits
load resolution. This is ok, because struct load_weight's load is also unsigned
long. If overflown, cfs_rq->load.weight will be overflown in the first place.

However, after a second thought, this is not quite right. Because load_avg is not
necessarily no greater than load, since load_avg has blocked load in it. Although,
load_avg is still at the same level as load (converging to be <= load), we may not
want the risk to overflow on 32bit.

> > +/*
> > + * NICE_0's weight (visible to user) and its load (invisible to user) have
> > + * independent resolution, but they should be well calibrated. We use scale_load()
> > + * and scale_load_down(w) to convert between them, the following must be true:
> > + * scale_load(prio_to_weight[20]) == NICE_0_LOAD
> > + */
> > #define NICE_0_LOAD SCHED_LOAD_SCALE
> > #define NICE_0_SHIFT SCHED_LOAD_SHIFT
>
> I still think tying the scale_load shift to be the same as the
> SCHED_CAPACITY/etc shift is silly, and tying the NICE_0_LOAD/SHIFT in is
> worse. Honestly if I was going to change anything it would be to define
> NICE_0_LOAD/SHIFT entirely separately from SCHED_LOAD_SCALE/SHIFT.

If NICE_0_LOAD is nice-0's load, and if SCHED_LOAD_SHIFT is to say how to get
nice-0's load, I don't understand why you want to separate them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/