Re: [PATCH 06/11] sched/irq: add irq utilization tracking

From: Vincent Guittot
Date: Tue Jul 31 2018 - 04:21:33 EST


On Tue, 31 Jul 2018 at 05:32, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:

> > > >
> > > > #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
> > > > if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
> > > > - sched_rt_avg_update(rq, irq_delta + steal);
> > > > + update_irq_load_avg(rq, irq_delta + steal);
> > >
> > > I think we should not add steal time into irq load tracking, steal
> > > time is always 0 on native kernel which doesn't matter, what will
> > > happen when guest disables IRQ_TIME_ACCOUNTING and enables
> > > PARAVIRT_TIME_ACCOUNTING? Steal time is not the real irq util_avg. In
> > > addition, we haven't exposed power management for performance which
> > > means that e.g. schedutil governor can not cooperate with passive mode
> > > intel_pstate driver to tune the OPP. To decay the old steal time avg
> > > and add the new one just wastes cpu cycles.
> >
> > In fact, I have kept the same behavior as with rt_avg, which was
> > already adding steal time when computing scale_rt_capacity, which is
> > used to reflect the remaining capacity for FAIR tasks and is used in
> > load balance. I'm not sure that it's worth using different variables
> > for irq and steal.
> > That being said, I see a possible optimization in schedutil when
> > PARAVIRT_TIME_ACCOUNTING is enable and IRQ_TIME_ACCOUNTING is disable.
> > With this kind of config, scale_irq_capacity can be a nop for
> > schedutil but scales the utilization for scale_rt_capacity
>
> Yeah, this is what in my mind before, you can make a patch for that. :)

ok, I'm going to prepare a patch

Thanks

>
> Regards,
> Wanpeng Li