Re: [PATCH v3 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests

From: Alexander Gordeev
Date: Tue Apr 16 2024 - 02:39:27 EST


On Mon, Apr 15, 2024 at 09:14:03PM +0200, David Hildenbrand wrote:
> > > +retry:
> > > + rc = walk_page_range_vma(vma, addr, vma->vm_end,
> > > + &find_zeropage_ops, &addr);
> > > + if (rc <= 0)
> > > + continue;
> >
> > So in case an error is returned for the last vma, __s390_unshare_zeropage()
> > finishes with that error. By contrast, the error for a non-last vma would
> > be ignored?
>
> Right, it looks a bit off. walk_page_range_vma() shouldn't fail
> unless find_zeropage_pte_entry() would fail -- which would also be
> very unexpected.
>
> To handle it cleanly in case we would ever get a weird zeropage where we
> don't expect it, we should probably just exit early.
>
> Something like the following (not compiled, addressing the comment below):

> @@ -2618,7 +2618,8 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
> struct vm_area_struct *vma;
> VMA_ITERATOR(vmi, mm, 0);
> unsigned long addr;
> - int rc;
> + vm_fault_t rc;
> + int zero_page;

I would use "fault" for mm faults (just like everywhere else handle_mm_fault() is
called) and leave rc as is:

vm_fault_t fault;
int rc;

> for_each_vma(vmi, vma) {
> /*
> @@ -2631,9 +2632,11 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
> addr = vma->vm_start;
> retry:
> - rc = walk_page_range_vma(vma, addr, vma->vm_end,
> - &find_zeropage_ops, &addr);
> - if (rc <= 0)
> + zero_page = walk_page_range_vma(vma, addr, vma->vm_end,
> + &find_zeropage_ops, &addr);
> + if (zero_page < 0)
> + return zero_page;
> + else if (!zero_page)
> continue;
> /* addr was updated by find_zeropage_pte_entry() */
> @@ -2656,7 +2659,7 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
> goto retry;
> }
> - return rc;
> + return 0;
> }
> static int __s390_disable_cow_sharing(struct mm_struct *mm)

..

> > > + /* addr was updated by find_zeropage_pte_entry() */
> > > + rc = handle_mm_fault(vma, addr,
> > > + FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE,
> > > + NULL);
> > > + if (rc & VM_FAULT_OOM)
> > > + return -ENOMEM;
> >
> > Heiko pointed out that rc type is inconsistent vs vm_fault_t returned by
>
> Right, let's use another variable for that.
>
> > handle_mm_fault(). While fixing it up, I've got concerned whether is it
> > fine to continue in case any other error is met (including possible future
> > VM_FAULT_xxxx)?
>
> Such future changes would similarly break break_ksm(). Staring at it, I do wonder
> if break_ksm() should be handling VM_FAULT_HWPOISON ... very likely we should
> handle it and fail -- we might get an MC while copying from the source page.
>
> VM_FAULT_HWPOISON on the shared zeropage would imply a lot of trouble, so
> I'm not concerned about that for the case here, but handling it in the future
> would be cleaner.
>
> Note that we always retry the lookup, so we won't just skip a zeropage on unexpected
> errors.
>
> We could piggy-back on vm_fault_to_errno(). We could use
> vm_fault_to_errno(rc, FOLL_HWPOISON), and only continue (retry) if the rc is 0 or
> -EFAULT, otherwise fail with the returned error.
>
> But I'd do that as a follow up, and also use it in break_ksm() in the same fashion.

@Christian, do you agree with this suggestion?

Thanks!