Re: [PATCH v7 00/13] fold per-CPU vmstats remotely

From: Michal Hocko
Date: Thu Mar 23 2023 - 08:18:26 EST


On Thu 23-03-23 07:52:22, Marcelo Tosatti wrote:
> On Thu, Mar 23, 2023 at 08:51:14AM +0100, Michal Hocko wrote:
> > On Wed 22-03-23 11:20:55, Marcelo Tosatti wrote:
> > > On Wed, Mar 22, 2023 at 02:35:20PM +0100, Michal Hocko wrote:
> > [...]
> > > > > "Performance details for the kworker interruption:
> > > > >
> > > > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
> > > > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ...
> > > > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
> > > > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
> > > > >
> > > > > The example above shows an additional 7us for the
> > > > >
> > > > > oslat -> kworker -> oslat
> > > > >
> > > > > switches. In the case of a virtualized CPU, and the vmstat_update
> > > > > interruption in the host (of a qemu-kvm vcpu), the latency penalty
> > > > > observed in the guest is higher than 50us, violating the acceptable
> > > > > latency threshold for certain applications."
> > > >
> > > > Yes, I have seen that but it doesn't really give a wider context to
> > > > understand why those numbers matter.
> > >
> > > OK.
> > >
> > > "In the case of RAN, a MAC scheduler with TTI=1ms, this causes >100us
> > > interruption observed in a guest (which is above the safety
> > > threshold for this application)."
> > >
> > > Is that OK?
> >
> > This might be a sufficient information for somebody familiar with the
> > matter (not me). So no, not enough. We need to hear a more complete
> > story.
>
> Michal,
>
> Please refer to
> https://www.diva-portal.org/smash/get/diva2:541460/FULLTEXT01.pdf
>
> 2.3 Channel Dependent Scheduling
> The purpose of scheduling is to decide which terminal will transmit data on which set
> of resource blocks with what transport format to use. The objective is to assign
> resources to the terminal such that the quality of service (QoS) requirement is fulfilled.
> Scheduling decision is taken every 1 ms by base station (termed as eNodeB) as the
> same length of Transmission Time Interval (TTI) in LTE system.
>
> In general:
>
> https://en.wikipedia.org/wiki/Real-time_computing

Thank you, but not something I was really asking for (repeatedly). I am
pretty aware of what RT computing is about. I am not really interested
in a generic fluff. I am asking about specific usecases you have in mind
when pushing these changes.

> For example, for the MAC scheduler processing must occur every 1ms,
> and a certain amount of computation takes place (and must finish before
> the next 1ms timeframe). A > 50us latency spike as observed by cyclictest
> is considered a "failure".

OK, you are claiming that much but you are not really filling up other
holes in your story. Let me just outline few questions I have. Your
measurements talk about 7us overhead the vmstat processing might add.
This is really far from > 50us above. You suggest that this is an effect
of the workload running in a guest without more details. I am quite
surprised to hear about RT expectations inside a guest system TBH.

All that being said, it would be really helpful if you were more
specific about the workload and why there is no other way but making
vmstat infrastructure more complex (it is quite complex on its own).

Thanks!

--
Michal Hocko
SUSE Labs