Re: [PATCH 1/4] mm: Export flush_vm_area() to sync the PTEs upon construction

From: Chris Wilson
Date: Fri Aug 21 2020 - 05:54:34 EST


Quoting Joerg Roedel (2020-08-21 10:51:29)
> On Fri, Aug 21, 2020 at 09:50:08AM +0100, Chris Wilson wrote:
> > The alloc_vm_area() is another method for drivers to
> > vmap/map_kernel_range that uses apply_to_page_range() rather than the
> > direct vmalloc walkers. This is missing the page table modification
> > tracking, and the ability to synchronize the PTE updates afterwards.
> > Provide flush_vm_area() for the users of alloc_vm_area() that assumes
> > the worst and ensures that the page directories are correctly flushed
> > upon construction.
> >
> > The impact is most pronounced on x86_32 due to the delayed set_pmd().
> >
> > Reported-by: Pavel Machek <pavel@xxxxxx>
> > References: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
> > References: 86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()")
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Joerg Roedel <jroedel@xxxxxxx>
> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > Cc: Dave Airlie <airlied@xxxxxxxxxx>
> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> > Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
> > Cc: Pavel Machek <pavel@xxxxxx>
> > Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx> # v5.8+
> > ---
> > include/linux/vmalloc.h | 1 +
> > mm/vmalloc.c | 16 ++++++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 0221f852a7e1..a253b27df0ac 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -204,6 +204,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
> >
> > /* Allocate/destroy a 'vmalloc' VM area. */
> > extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes);
> > +extern void flush_vm_area(struct vm_struct *area);
> > extern void free_vm_area(struct vm_struct *area);
> >
> > /* for /dev/kmem */
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index b482d240f9a2..c41934486031 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3078,6 +3078,22 @@ struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes)
> > }
> > EXPORT_SYMBOL_GPL(alloc_vm_area);
> >
> > +void flush_vm_area(struct vm_struct *area)
> > +{
> > + unsigned long addr = (unsigned long)area->addr;
> > +
> > + /* apply_to_page_range() doesn't track the damage, assume the worst */
> > + if (ARCH_PAGE_TABLE_SYNC_MASK & (PGTBL_PTE_MODIFIED |
> > + PGTBL_PMD_MODIFIED |
> > + PGTBL_PUD_MODIFIED |
> > + PGTBL_P4D_MODIFIED |
> > + PGTBL_PGD_MODIFIED))
> > + arch_sync_kernel_mappings(addr, addr + area->size);
>
> This should happen in __apply_to_page_range() directly and look like
> this:

Ok. I thought it had to be after assigning the *ptep. If we apply the
sync first, do not have to worry about PGTBL_PTE_MODIFIED from the
*ptep?
-Chris