Re: [PATCH v7 00/13] fold per-CPU vmstats remotely

From: Marcelo Tosatti
Date: Thu Mar 23 2023 - 09:31:27 EST

Next message: Limonciello, Mario: "RE: [PATCH v6 4/4] i2c: designware: Add doorbell support for Mendocino"
Previous message: Bartosz Golaszewski: "Re: [PATCH 12/14] dt-bindings: pinctrl: qcom,pmic-gpio: add compatible for pmm8654au-gpio"
In reply to: Michal Hocko: "Re: [PATCH v7 00/13] fold per-CPU vmstats remotely"
Next in thread: Marcelo Tosatti: "Re: [PATCH v7 00/13] fold per-CPU vmstats remotely"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Mar 23, 2023 at 01:17:32PM +0100, Michal Hocko wrote:
> On Thu 23-03-23 07:52:22, Marcelo Tosatti wrote:
> > On Thu, Mar 23, 2023 at 08:51:14AM +0100, Michal Hocko wrote:
> > > On Wed 22-03-23 11:20:55, Marcelo Tosatti wrote:
> > > > On Wed, Mar 22, 2023 at 02:35:20PM +0100, Michal Hocko wrote:
> > > [...]
> > > > > > "Performance details for the kworker interruption:
> > > > > >
> > > > > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
> > > > > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ...
> > > > > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
> > > > > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
> > > > > >
> > > > > > The example above shows an additional 7us for the
> > > > > >
> > > > > > oslat -> kworker -> oslat
> > > > > >
> > > > > > switches. In the case of a virtualized CPU, and the vmstat_update
> > > > > > interruption in the host (of a qemu-kvm vcpu), the latency penalty
> > > > > > observed in the guest is higher than 50us, violating the acceptable
> > > > > > latency threshold for certain applications."
> > > > >
> > > > > Yes, I have seen that but it doesn't really give a wider context to
> > > > > understand why those numbers matter.
> > > >
> > > > OK.
> > > >
> > > > "In the case of RAN, a MAC scheduler with TTI=1ms, this causes >100us
> > > > interruption observed in a guest (which is above the safety
> > > > threshold for this application)."
> > > >
> > > > Is that OK?
> > >
> > > This might be a sufficient information for somebody familiar with the
> > > matter (not me). So no, not enough. We need to hear a more complete
> > > story.
> >
> > Michal,
> >
> > Please refer to
> > https://www.diva-portal.org/smash/get/diva2:541460/FULLTEXT01.pdf
> >
> > 2.3 Channel Dependent Scheduling
> > The purpose of scheduling is to decide which terminal will transmit data on which set
> > of resource blocks with what transport format to use. The objective is to assign
> > resources to the terminal such that the quality of service (QoS) requirement is fulfilled.
> > Scheduling decision is taken every 1 ms by base station (termed as eNodeB) as the
> > same length of Transmission Time Interval (TTI) in LTE system.
> >
> > In general:
> >
> > https://en.wikipedia.org/wiki/Real-time_computing
>
> Thank you, but not something I was really asking for (repeatedly). I am
> pretty aware of what RT computing is about. I am not really interested
> in a generic fluff. I am asking about specific usecases you have in mind
> when pushing these changes.
>
> > For example, for the MAC scheduler processing must occur every 1ms,
> > and a certain amount of computation takes place (and must finish before
> > the next 1ms timeframe). A > 50us latency spike as observed by cyclictest
> > is considered a "failure".
>
> OK, you are claiming that much but you are not really filling up other
> holes in your story. Let me just outline few questions I have. Your
> measurements talk about 7us overhead the vmstat processing might add.
> This is really far from > 50us above.

7us in the host, for the following sched_switch events:

oslat -> kworker
kworker -> oslat

However, if the impact is for a virtualized application:

oslat, executing via qemu-vcpu process in the host.

oslat executing
qemu-vcpu VM-EXIT
qemu-vcpu -> kworker
kworker -> qemu-vcpu
qemu-vcpu VM-ENTRY

is much higher than the 7us (can be above 100us).

> You suggest that this is an effect
> of the workload running in a guest without more details. I am quite
> surprised to hear about RT expectations inside a guest system TBH.

https://www.youtube.com/watch?v=zIDwc6uDszY

> All that being said, it would be really helpful if you were more
> specific about the workload and why there is no other way but making
> vmstat infrastructure more complex (it is quite complex on its own).

The patchset is just changing vmstat_shepherd from happening locally
to happening remotely. There are a number of algorithms in the kernel
that deal with concurrent access already.

What you think this particular patchset makes things complicated
and what can be done to make it simpler?

Next message: Limonciello, Mario: "RE: [PATCH v6 4/4] i2c: designware: Add doorbell support for Mendocino"
Previous message: Bartosz Golaszewski: "Re: [PATCH 12/14] dt-bindings: pinctrl: qcom,pmic-gpio: add compatible for pmm8654au-gpio"
In reply to: Michal Hocko: "Re: [PATCH v7 00/13] fold per-CPU vmstats remotely"
Next in thread: Marcelo Tosatti: "Re: [PATCH v7 00/13] fold per-CPU vmstats remotely"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]