Re: [RFC PATCH 00/13] Core scheduling v5

From: Peter Zijlstra
Date: Fri Apr 17 2020 - 09:09:03 EST


On Fri, Apr 17, 2020 at 02:35:38PM +0200, Alexander Graf wrote:
> On 17.04.20 13:12, Peter Zijlstra wrote:

> If we first kick out the sibling HT for every #VMEXIT, performance will be
> abysmal, no?

I've been given to understand that people serious about virt try really
hard to avoid VMEXIT.


> > That doesn't completely solve things I think. Even if you run all
> > untrusted tasks as core exclusive, you still have a problem of them vs
> > interrupts on the other sibling.
> >
> > You need to somehow arrange all interrupts to the core happen on the
> > same sibling that runs your untrusted task, such that the VERW on
> > return-to-userspace works as intended.
> >
> > I suppose you can try and play funny games with interrupt routing tied
> > to the force-idle state, but I'm dreading what that'll look like. Or
> > were you going to handle this from your irq_enter() thing too?
>
> I'm not sure I follow. We have thread local interrupts (timers, IPIs) and
> device interrupts (network, block, etc).
>
> Thread local ones shouldn't transfer too much knowledge, so I'd be inclined
> to say we can just ignore that attack vector.
>
> Device interrupts we can easily route to HT0. If we now make "core
> exclusive" a synonym for "always run on HT0", we can guarantee that they
> always land on the same CPU, no?
>
> Then you don't need to hook into any idle state tracking, because you always
> know which CPU the "safe" one to both schedule tasks and route interrupts to
> is.

That would come apart most mighty when someone does an explicit
sched_setaffinity() for !HT0.

While that might work for some relatively contained systems like
chromeos, it will not work in general I think.