Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

From: Heinrich Schuchardt
Date: Thu Sep 05 2019 - 04:48:57 EST


On 9/5/19 10:16 AM, Peter Maydell wrote:
On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <maz@xxxxxxxxxx> wrote:
How can you tell that the access would fault? You have no idea at that
stage (the kernel doesn't know about the MMIO ranges that userspace
handles). All you know is that you're faced with a memory access that
you cannot emulate in the kernel. Injecting a data abort at that stage
is not something that the architecture allows.

To be fair, locking up the whole CPU (which is effectively
what the kvm_err/ENOSYS is going to do to the VM) isn't
something the architecture allows either :-)

Of course, the best thing would be to actually fix the guest so that
it doesn't use non-emulatable MMIO accesses. In general, that the sign
of a bug in low-level accessors.

This is true, but the problem is that barfing out to userspace
makes it harder to debug the guest because it means that
the VM is immediately destroyed, whereas AIUI if we
inject some kind of exception then (assuming you're set up
to do kernel-debug via gdbstub) you can actually examine
the offending guest code with a debugger because at least
your VM is still around to inspect...

Stopping the CPU and debugging is not what I am interested in. I want
the QEMU guest to be able to react to an incorrect memory access.

Imagine Apollo 11's computer not restarting when hitting an exception.
They would never have reached the moon. - I think allowing an emulation
guest to react to an exception, e.g. by resetting, is a necessity.

In my case U-Boot as a guest creates an output like the one below when a
data abort occurs:

"Synchronous Abort" handler, esr 0x02000000
elr: fffffffffdeac19c lr : fffffffffdeac19c (reloc)
elr: 000000007ddd719c lr : 000000007ddd719c
x0 : 0000000000000000 x1 : 000000007ffbc000
x2 : 000000000000000a x3 : 000000007ffbcd80
x4 : 0000000000002800 x5 : 000000007ffbcdb0
x6 : 0000000000000001 x7 : 000000007eef8b80
x8 : 000000000000003f x9 : 0000000000000004
x10: 0000000000000001 x11: 000000000000000d
x12: 0000000000000006 x13: 000000000001869f
x14: 0000000047f00000 x15: 0000000000000000
x16: 000000007ff5b194 x17: 0000000000000000
x18: 0000000000000000 x19: 000000007ffbcd30
x20: 0000000000000000 x21: 000000007ffeb000
x22: 0000000000000009 x23: 000000007eef5cf0
x24: 0000000000000000 x25: 000000007ffa7806
x26: 000000007ffa7834 x27: 0000000000000024
x28: 000000007dddd040 x29: 000000007ede9990

UEFI image [0x000000007ddd7000:0x000000007ddd749f] pc=0x19c '/bug.efi'
Resetting CPU ...

With this information I see that the problem occurred at 0x019C from the
start of the loaded binary bug.efi. Next thing is to look at the map
file of bug.efi to find out in which instruction the problem occurred.

After providing the dump U-Boot continues to reset the system.

When U-Boot is running the EDK II SCT (a test suite for UEFI firmware),
SCT will log that a restart occurred (indicating that a test failed) and
continue to run the next test.

Best regards

Heinrich