Re: cputime takes cstate into consideration

From: Peter Zijlstra
Date: Wed Jun 26 2019 - 12:16:29 EST


On Wed, Jun 26, 2019 at 10:54:13AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 26, 2019 at 12:33:30PM +0200, Thomas Gleixner wrote:
> > On Wed, 26 Jun 2019, Wanpeng Li wrote:
> > > After exposing mwait/monitor into kvm guest, the guest can make
> > > physical cpu enter deeper cstate through mwait instruction, however,
> > > the top command on host still observe 100% cpu utilization since qemu
> > > process is running even though guest who has the power management
> > > capability executes mwait. Actually we can observe the physical cpu
> > > has already enter deeper cstate by powertop on host. Could we take
> > > cstate into consideration when accounting cputime etc?
> >
> > If MWAIT can be used inside the guest then the host cannot distinguish
> > between execution and stuck in mwait.
> >
> > It'd need to poll the power monitoring MSRs on every occasion where the
> > accounting happens.
> >
> > This completely falls apart when you have zero exit guest. (think
> > NOHZ_FULL). Then you'd have to bring the guest out with an IPI to access
> > the per CPU MSRs.
> >
> > I assume a lot of people will be happy about all that :)
>
> There were some ideas that Ankur (CC-ed) mentioned to me of using the perf
> counters (in the host) to sample the guest and construct a better
> accounting idea of what the guest does. That way the dashboard
> from the host would not show 100% CPU utilization.

But then you generate extra noise and vmexits on those cpus, just to get
this accounting sorted, which sounds like a bad trade.