Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\

From: Jan Kiszka
Date: Mon Sep 25 2017 - 14:29:41 EST


On 2017-09-25 12:41, Thomas Gleixner wrote:
> On Sun, 24 Sep 2017, Marcelo Tosatti wrote:
>> On Fri, Sep 22, 2017 at 03:01:41PM +0200, Peter Zijlstra wrote:
>> What the patch does is the following:
>> It reduces the window where SCHED_FIFO is applied vcpu0
>> to those were a spinlock is shared between -RT vcpus and vcpu0
>> (why: because otherwise, when the emulator thread is sharing a
>> pCPU with vcpu0, its unable to generate interrupts vcpu0).
>>
>> And its being rejected because:
>> Please fill in.
>
> Your patch is just papering over one particular problem, but it's not
> fixing the root cause. That's the worst engineering approach and we all
> know how fast this kind of crap falls over.
>
> There are enough other issues which can cause starvation of the RT VCPUs
> when the housekeeping VCPU is preempted, not just the particular problem
> which you observed.
>
> Back then when I did the first prototype of RT in KVM, I made it entirely
> clear, that you have to spend one physical CPU for _each_ VCPU, independent
> whether the VCPU is reserved for RT workers or the housekeeping VCPU. The
> emulator thread needs to run on a separate physical CPU.
>
> If you want to run the housekeeping VCPU and the emulator thread on the
> same physical CPU then you have to make sure that both the emulator and the
> housekeeper side of affairs are designed and implemented with RT in
> mind. As long as that is not the case, you simply cannot run them on the
> same physical CPU. RT is about guarantees and guarantees cannot be achieved
> with bandaid engineering.

It's even more complicated for the guest: It needs to be aware of the
latencies its interaction with a VM - instead of a real machine - may
cause while being in whatever critical sections. That's an additional
design dimension that would be very hard to establish and maintain, even
in Linux.

The only way around that is to truly decouple guest CPUs via full core
isolation inside the Linux guest and have your RT guest application
exploit this partitioning, e.g. by using lock-less inter-core
communication without kernel help.

The reason I was playing with PV-sched back then was to explore how you
could map the guest's task prio dynamically on its host vcpu. That
involved boosting whenever en event (aka irq) came in for the guest
vcpu. It turned out to be a more or less working solution looking for a
real-world problem.

Jan

--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux