Re: [PATCH 8/9] x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user()

From: Al Viro
Date: Sat May 30 2020 - 14:39:27 EST


On Sat, May 30, 2020 at 10:57:24AM -0700, Linus Torvalds wrote:

> So no. I disagree. There is absolutely nothing "obviously ok" about
> any of that kvm code. Quite the reverse.
>
> I'd argue that it's very much obviously *NOT* ok, even while it might
> just happen to work.

Actually, it's somewhat less brittle than you think (on non-mips, at least)
and not due to those long-ago access_ok().

> That double underscore needs to go away. It's either actively buggy
> right now and I see no proof it isn't, or it's a bug just waiting to
> happen in the future.

FWIW, the kvm side of things (vhost is yet another pile of fun) is

[x86] kvm_hv_set_msr_pw():
arch/x86/kvm/hyperv.c:1027: if (__copy_to_user((void __user *)addr, instructions, 4))
HV_X64_MSR_HYPERCALL
arch/x86/kvm/hyperv.c:1132: if (__clear_user((void __user *)addr, sizeof(u32)))
HV_X64_MSR_VP_ASSIST_PAGE
in both cases addr comes from
gfn = data >> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT;
addr = kvm_vcpu_gfn_to_hva(vcpu, gfn);
if (kvm_is_error_hva(addr))
return 1;

[x86] FNAME(walk_addr_generic), very hot:
arch/x86/kvm/mmu/paging_tmpl.h:403: if (unlikely(__get_user(pte, ptep_user)))
index = PT_INDEX(addr, walker->level);
...
offset = index * sizeof(pt_element_t);
...
host_addr = kvm_vcpu_gfn_to_hva_prot(vcpu, real_gfn,
&walker->pte_writable[walker->level - 1]);
if (unlikely(kvm_is_error_hva(host_addr)))
goto error;
ptep_user = (pt_element_t __user *)((void *)host_addr + offset);

__kvm_read_guest_page():
virt/kvm/kvm_main.c:2252: r = __copy_from_user(data, (void __user *)addr + offset, len);
addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
if (kvm_is_error_hva(addr))
return -EFAULT;

__kvm_read_guest_atomic():
virt/kvm/kvm_main.c:2326: r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len);
addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
if (kvm_is_error_hva(addr))
return -EFAULT;

__kvm_write_guest_page():
virt/kvm/kvm_main.c:2353: r = __copy_to_user((void __user *)addr + offset, data, len);
addr = gfn_to_hva_memslot(memslot, gfn);
if (kvm_is_error_hva(addr))
return -EFAULT;

kvm_write_guest_offset_cached():
virt/kvm/kvm_main.c:2490: r = __copy_to_user((void __user *)ghc->hva + offset, data, len);
if (kvm_is_error_hva(ghc->hva))
return -EFAULT;

kvm_read_guest_cached():
virt/kvm/kvm_main.c:2525: r = __copy_from_user(data, (void __user *)ghc->hva, len);
if (kvm_is_error_hva(ghc->hva))
return -EFAULT;

default kvm_is_error_hva() is addr >= PAGE_OFFSET; however, on mips and s390 it's
IS_ERR_VALUE().

Sure, we can use non-__ variants, but is access_ok() the right primitive here?
We want userland memory, set_fs() be damned.