Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

From: Peter Xu
Date: Thu Apr 14 2022 - 11:46:04 EST


On Thu, Apr 14, 2022 at 01:56:12PM +0000, Sean Christopherson wrote:
> On Wed, Apr 13, 2022, Peter Xu wrote:
> > Our QE team reported test failure on access_tracking_perf_test:
> >
> > Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> > guest physical test memory offset: 0x3fffbffff000
> >
> > Populating memory : 0.684014577s
> > Writing to populated memory : 0.006230175s
> > Reading from populated memory : 0.004557805s
> > ==== Test Assertion Failure ====
> > lib/kvm_util.c:1411: false
> > pid=125806 tid=125809 errno=4 - Interrupted system call
> > 1 0x0000000000402f7c: addr_gpa2hva at kvm_util.c:1411
> > 2 (inlined by) addr_gpa2hva at kvm_util.c:1405
> > 3 0x0000000000401f52: lookup_pfn at access_tracking_perf_test.c:98
> > 4 (inlined by) mark_vcpu_memory_idle at access_tracking_perf_test.c:152
> > 5 (inlined by) vcpu_thread_main at access_tracking_perf_test.c:232
> > 6 0x00007fefe9ff81ce: ?? ??:0
> > 7 0x00007fefe9c64d82: ?? ??:0
> > No vm physical memory at 0xffbffff000
> >
> > And I can easily reproduce it with a Intel(R) Xeon(R) CPU E5-2630 with 46
> > bits PA.
> >
> > It turns out that the address translation for clearing idle page tracking
> > returned wrong result, in which addr_gva2gpa()'s last step should have
>
> "should have" is very misleading, that makes it sound like the address was
> intentionally truncated. Or did you mean "should have been treated as 64-bit
> value"?

No I was purely lazy yesterday as it was late, sorry. Obviously I should
have hold-off the patch until this morning because I do plan to look at
this into more details.

So sadly it's only gcc that's not working properly with the bitfields.. at
least in my minimum test here.

---8<---
$ cat a.c
#include <stdio.h>

struct test1 {
unsigned long a:24;
unsigned long b:40;
};

int main(void)
{
struct test1 val;
val.b = 0x123456789a;
printf("0x%lx\n", val.b * 16);
return 0;
}
$ gcc -o a.gcc a.c
$ clang -o a.clang a.c
$ ./a.gcc
0x23456789a0
$ ./a.clang
0x123456789a0
$ objdump -d a.gcc | grep -A20 -w main
0000000000401126 <main>:
401126: 55 push %rbp
401127: 48 89 e5 mov %rsp,%rbp
40112a: 48 83 ec 10 sub $0x10,%rsp
40112e: 48 8b 45 f8 mov -0x8(%rbp),%rax
401132: 25 ff ff ff 00 and $0xffffff,%eax
401137: 48 89 c2 mov %rax,%rdx
40113a: 48 b8 00 00 00 9a 78 movabs $0x123456789a000000,%rax
401141: 56 34 12
401144: 48 09 d0 or %rdx,%rax
401147: 48 89 45 f8 mov %rax,-0x8(%rbp)
40114b: 48 8b 45 f8 mov -0x8(%rbp),%rax
40114f: 48 c1 e8 18 shr $0x18,%rax
401153: 48 c1 e0 04 shl $0x4,%rax
401157: 48 ba ff ff ff ff ff movabs $0xffffffffff,%rdx
40115e: 00 00 00
401161: 48 21 d0 and %rdx,%rax <-------------------
401164: 48 89 c6 mov %rax,%rsi
401167: bf 10 20 40 00 mov $0x402010,%edi
40116c: b8 00 00 00 00 mov $0x0,%eax
401171: e8 ba fe ff ff callq 401030 <printf@plt>
$ objdump -d a.clang | grep -A20 -w main
0000000000401130 <main>:
401130: 55 push %rbp
401131: 48 89 e5 mov %rsp,%rbp
401134: 48 83 ec 10 sub $0x10,%rsp
401138: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
40113f: 48 8b 45 f0 mov -0x10(%rbp),%rax
401143: 48 25 ff ff ff 00 and $0xffffff,%rax
401149: 48 b9 00 00 00 9a 78 movabs $0x123456789a000000,%rcx
401150: 56 34 12
401153: 48 09 c8 or %rcx,%rax
401156: 48 89 45 f0 mov %rax,-0x10(%rbp)
40115a: 48 8b 75 f0 mov -0x10(%rbp),%rsi
40115e: 48 c1 ee 18 shr $0x18,%rsi
401162: 48 c1 e6 04 shl $0x4,%rsi
401166: 48 bf 10 20 40 00 00 movabs $0x402010,%rdi
40116d: 00 00 00
401170: b0 00 mov $0x0,%al
401172: e8 b9 fe ff ff callq 401030 <printf@plt>
401177: 31 c0 xor %eax,%eax
401179: 48 83 c4 10 add $0x10,%rsp
40117d: 5d pop %rbp
---8<---

>
> > treated "pte[index[0]].pfn" to be a 32bit value.
>
> It didn't get treated as a 32-bit value, it got treated as a 40-bit value, because
> the pfn is stored as 40 bits.
>
> struct pageTableEntry {
> uint64_t present:1;
> uint64_t writable:1;
> uint64_t user:1;
> uint64_t write_through:1;
> uint64_t cache_disable:1;
> uint64_t accessed:1;
> uint64_t dirty:1;
> uint64_t reserved_07:1;
> uint64_t global:1;
> uint64_t ignored_11_09:3;
> uint64_t pfn:40; <================
> uint64_t ignored_62_52:11;
> uint64_t execute_disable:1;
> };
>
> > In above case the GPA
> > address 0x3fffbffff000 got cut-off into 0xffbffff000, then it caused
> > further lookup failure in the gpa2hva mapping.
> >
> > I didn't yet check any other test that may fail too on some hosts, but
> > logically any test using addr_gva2gpa() could suffer.
> >
> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075036
> > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> > ---
> > tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > index 9f000dfb5594..6c356fb4a9bf 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > @@ -587,7 +587,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
> > if (!pte[index[0]].present)
> > goto unmapped_gva;
> >
> > - return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > + return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
>
> This is but one of many paths that can get burned by pfn being 40 bits. The
> most backport friendly fix is probably to add a pfn=>gpa helper and use that to
> place the myriad "pfn * vm->page_size" instances.

Yes, I'll respin.

>
> For a true long term solution, my vote is to do away with the bit field struct
> and use #define'd masks and whatnot.

I agree.

Thanks,

--
Peter Xu