Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

From: Christoffer Dall
Date: Thu Sep 05 2019 - 08:16:43 EST


Hi Heinrich,

On Thu, Sep 05, 2019 at 02:01:36PM +0200, Heinrich Schuchardt wrote:
> On 9/5/19 11:20 AM, Stefan Hajnoczi wrote:
> > On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote:
> > > If an application tries to access memory that is not mapped, an error
> > > ENOSYS, "load/store instruction decoding not implemented" may occur.
> > > QEMU will hang with a register dump.
> > >
> > > Instead create a data abort that can be handled gracefully by the
> > > application running in the virtual environment.
> > >
> > > Now the virtual machine can react to the event in the most appropriate
> > > way - by recovering, by writing an informative log, or by rebooting.
> > >
> > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@xxxxxx>
> > > ---
> > > virt/kvm/arm/mmio.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> > > index a8a6a0c883f1..0cbed7d6a0f4 100644
> > > --- a/virt/kvm/arm/mmio.c
> > > +++ b/virt/kvm/arm/mmio.c
> > > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> > > if (ret)
> > > return ret;
> > > } else {
> > > - kvm_err("load/store instruction decoding not implemented\n");
> > > - return -ENOSYS;
> > > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> > > + return 1;
> >
> > I see this more as a temporary debugging hack than something to merge.
> >
> > It sounds like in your case the guest environment provided good
> > debugging information and you preferred it over debugging this from the
> > host side. That's fine, but allowing the guest to continue running in
> > the general case makes it much harder to track down the root cause of a
> > problem because many guest CPU instructions may be executed after the
> > original problem occurs. Other guest software may fail silently in
> > weird ways. IMO it's best to fail early.
> >
> > Stefan
> >
>
> As virtual machine are ubiquitous, expect also mission critical system
> to run on them. At development time halting a machine may be a good
> idea. In production this is often the worst solution. Rebooting may be
> essential for survival.
>
> For an anecdotal example see:
> https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html
>
> I am convinced that leaving it to the guest to decide how to react is
> the best choice.
>
Maintaining strong adherence to the architecture is equally important,
and I'm sure we can find anecdotes to support how not doing the
expected, can also lead to disastrous outcomes.

Have you had a look at the suggested patch I sent? The idea is that we
can preserve existing legacy ABI, allow for a better debugging
experience, allow userspace to do emulation if it so wishes, and provide
a better error message if userspace doesn't handle this properly.

One thing we could change from my proposed patch would be to have KVM
inject the access as an external abort if the target address also
doesn't hit an MMIO device, which is by far the common scenario reported
here on the list.

Hopefully, a mission critical deployment based on KVM/Arm (scary as that
sounds), would use a recent and patched VMM (QEMU) that either causes
the external abort, or reboots the VM, as per the configuration of the
particular system in question.


Thanks,

Christoffer