Re: [PATCH v8 01/17] KVM: s390: pv: leak the topmost page table when destroy fails

From: Claudio Imbrenda
Date: Thu Mar 03 2022 - 10:06:03 EST


On Thu, 3 Mar 2022 15:40:42 +0100
Janis Schoetterl-Glausch <scgl@xxxxxxxxxxxxx> wrote:

> On 3/2/22 19:11, Claudio Imbrenda wrote:
> > Each secure guest must have a unique ASCE (address space control
> > element); we must avoid that new guests use the same page for their
> > ASCE, to avoid errors.
> >
> > Since the ASCE mostly consists of the address of the topmost page table
> > (plus some flags), we must not return that memory to the pool unless
> > the ASCE is no longer in use.
> >
> > Only a successful Destroy Secure Configuration UVC will make the ASCE
> > reusable again.
> >
> > If the Destroy Configuration UVC fails, the ASCE cannot be reused for a
> > secure guest (either for the ASCE or for other memory areas). To avoid
> > a collision, it must not be used again. This is a permanent error and
> > the page becomes in practice unusable, so we set it aside and leak it.
> > On failure we already leak other memory that belongs to the ultravisor
> > (i.e. the variable and base storage for a guest) and not leaking the
> > topmost page table was an oversight.
> >
> > This error (and thus the leakage) should not happen unless the hardware
> > is broken or KVM has some unknown serious bug.
> >
> > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
> > Fixes: 29b40f105ec8d55 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling")
> > ---
> > arch/s390/include/asm/gmap.h | 2 +
> > arch/s390/kvm/pv.c | 9 +++--
> > arch/s390/mm/gmap.c | 71 ++++++++++++++++++++++++++++++++++++
> > 3 files changed, 79 insertions(+), 3 deletions(-)
> >
> [...]
>
> > +/**
> > + * s390_replace_asce - Try to replace the current ASCE of a gmap with
> > + * another equivalent one.
> > + * @gmap the gmap
> > + *
> > + * If the allocation of the new top level page table fails, the ASCE is not
> > + * replaced.
> > + * In any case, the old ASCE is always removed from the list. Therefore the
> > + * caller has to make sure to save a pointer to it beforehands, unless an
> > + * intentional leak is intended.
> > + */
> > +int s390_replace_asce(struct gmap *gmap)
> > +{
> > + unsigned long asce;
> > + struct page *page;
> > + void *table;
> > +
> > + s390_remove_old_asce(gmap);
> > +
> > + page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> > + if (!page)
> > + return -ENOMEM;
> > + table = page_to_virt(page);
> > + memcpy(table, gmap->table, 1UL << (CRST_ALLOC_ORDER + PAGE_SHIFT));
>
> Is concurrent modification of *gmap->table possible during the copy?

that would only be possible if the guest touches memory in such way
that the table needs to be changed.

this function is only called when the guest is not running (e.g. during
reboot), so nobody should touch the table

>
> > +
> > + /*
> > + * The caller has to deal with the old ASCE, but here we make sure
> > + * the new one is properly added to the list of page tables, so that
> > + * it will be freed when the VM is torn down.
> > + */
> > + spin_lock(&gmap->guest_table_lock);
> > + list_add(&page->lru, &gmap->crst_list);
> > + spin_unlock(&gmap->guest_table_lock);
> > +
> > + /* Set new table origin while preserving existing ASCE control bits */
> > + asce = (gmap->asce & _ASCE_ORIGIN) | __pa(table);
> > + WRITE_ONCE(gmap->asce, asce);
> > + WRITE_ONCE(gmap->mm->context.gmap_asce, asce);
> > + WRITE_ONCE(gmap->table, table);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(s390_replace_asce);
>