Re: [RFC v1 2/2] mm/gup/writeback: add callbacks for inaccessible pages

From: Christian Borntraeger
Date: Fri Feb 28 2020 - 11:08:30 EST


Andrew,

while patch 1 is a fixup for the FOLL_PIN work in your patch queue,
I would really love to see this patch in 5.7. The exploitation code
of kvm/s390 is in Linux next also scheduled for 5.7.

Christian

On 28.02.20 16:43, Claudio Imbrenda wrote:
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
>
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail. We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.
>
> This is not only to enable paging, file backing etc, it is also
> necessary to protect the host against a malicious user space. For
> example a bad QEMU could simply start direct I/O on such protected
> memory. We do not want userspace to be able to trigger I/O errors and
> thus we the logic is "whenever somebody accesses that page (gup) or
> does I/O, make sure that this page can be accessed". When the guest
> tries to access that page we will wait in the page fault handler for
> writeback to have finished and for the page_ref to be the expected
> value.
>
> On s390x the function is not supposed to fail, so it is ok to use a
> WARN_ON on failure. If we ever need some more finegrained handling
> we can tackle this when we know the details.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
> Acked-by: Will Deacon <will@xxxxxxxxxx>
> Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>
> Reviewed-by: Christian Borntraeger <borntraeger@xxxxxxxxxx>
> Signed-off-by: Christian Borntraeger <borntraeger@xxxxxxxxxx>
> ---
> include/linux/gfp.h | 6 ++++++
> mm/gup.c | 19 ++++++++++++++++---
> mm/page-writeback.c | 5 +++++
> 3 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index e5b817cb86e7..be2754841369 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
> #ifndef HAVE_ARCH_ALLOC_PAGE
> static inline void arch_alloc_page(struct page *page, int order) { }
> #endif
> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +static inline int arch_make_page_accessible(struct page *page)
> +{
> + return 0;
> +}
> +#endif
>
> struct page *
> __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> diff --git a/mm/gup.c b/mm/gup.c
> index 0b9a806898f3..86fff6e4e4f3 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -391,6 +391,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> struct page *page;
> spinlock_t *ptl;
> pte_t *ptep, pte;
> + int ret;
>
> /* FOLL_GET and FOLL_PIN are mutually exclusive. */
> if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
> @@ -449,8 +450,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> if (is_zero_pfn(pte_pfn(pte))) {
> page = pte_page(pte);
> } else {
> - int ret;
> -
> ret = follow_pfn_pte(vma, address, ptep, flags);
> page = ERR_PTR(ret);
> goto out;
> @@ -458,7 +457,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> }
>
> if (flags & FOLL_SPLIT && PageTransCompound(page)) {
> - int ret;
> get_page(page);
> pte_unmap_unlock(ptep, ptl);
> lock_page(page);
> @@ -475,6 +473,14 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
> page = ERR_PTR(-ENOMEM);
> goto out;
> }
> + if (flags & FOLL_PIN) {
> + ret = arch_make_page_accessible(page);
> + if (ret) {
> + unpin_user_page(page);
> + page = ERR_PTR(ret);
> + goto out;
> + }
> + }
> if (flags & FOLL_TOUCH) {
> if ((flags & FOLL_WRITE) &&
> !pte_dirty(pte) && !PageDirty(page))
> @@ -2143,6 +2149,13 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>
> VM_BUG_ON_PAGE(compound_head(page) != head, page);
>
> + if (flags & FOLL_PIN) {
> + ret = arch_make_page_accessible(page);
> + if (ret) {
> + unpin_user_page(page);
> + goto pte_unmap;
> + }
> + }
> SetPageReferenced(page);
> pages[*nr] = page;
> (*nr)++;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ab5a3cee8ad3..8384be5a2758 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2807,6 +2807,11 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
> inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
> }
> unlock_page_memcg(page);
> + /*
> + * If writeback has been triggered on a page that cannot be made
> + * accessible, it is too late.
> + */
> + WARN_ON(arch_make_page_accessible(page));
> return ret;
>
> }
>