Re: [PATCH] RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()

From: Anup Patel
Date: Thu Jun 19 2025 - 03:06:44 EST


On Tue, Jun 17, 2025 at 8:06 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Sun, Jun 15, 2025, Anup Patel wrote:
> > On Sat, Jun 14, 2025 at 3:59 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 12, 2025, Andrew Jones wrote:
> > > > On Wed, Jun 11, 2025 at 09:17:36AM -0700, Sean Christopherson wrote:
> > > > > Looks like y'all also have a bug where an -EEXIST will be returned to userspace,
> > > > > and will generate what's probably a spurious kvm_err() message.
> > > >
> > > > On 32-bit riscv, due to losing the upper bits of the physical address? Or
> > > > is there yet another thing to fix?
> > >
> > > Another bug, I think. gstage_set_pte() returns -EEXIST if a PTE exists, and I
> > > _assume_ that's supposed to be benign? But this code returns it blindly:
> >
> > gstage_set_pte() returns -EEXIST only when it was expecting a non-leaf
> > PTE at a particular level but got a leaf PTE
>
> Right, but isn't returning -EEXIST all the way to userspace undesirable behavior?
>
> E.g. in this sequence, KVM will return -EEXIST and incorrectly terminate the VM
> (assuming the VMM doesn't miraculously recover somehow):
>
> 1. Back the VM with HugeTLBFS
> 2. Fault-in memory, i.e. create hugepage mappings
> 3. Enable KVM_MEM_LOG_DIRTY_PAGES
> 4. Write-protection fault, kvm_riscv_gstage_map() tries to create a writable
> non-huge mapping.
> 5. gstage_set_pte() encounters the huge leaf PTE before reaching the target
> level, and returns -EEXIST.

The gstage_set_pte() does not fail in any of the above cases because the
desired page table "level" of the PTE is passed to gstage_set_pte() as
parameter. The -EEXIST failure is only when gstage_set_pte() sees an
existing leaf PTE at a level above the desired page table level which can
only occur if there is some BUG in KVM g-stage programming.

>
> AFAICT, gstage_wp_memory_region() doesn't split/shatter/demote hugepages, it
> simply clears _PAGE_WRITE.
>
> It's entirely possible I'm missing something that makes the above scenario
> impossible in practice, but at this point I'm genuinely curious :-)

The -EEXIST failure in gstage_set_pte() is very unlikely to happen but I see
your point about unnecessarily exiting to user space since user space has
nothing to do with this failure.

I think it's better to WARN() and return 0 instead of returning -EEXIST.

Regards,
Anup