Re: [PATCH] vmap(): don't allow invalid pages

From: Mark Rutland
Date: Wed Jan 19 2022 - 13:06:23 EST


On Wed, Jan 19, 2022 at 09:00:23AM -0800, Yury Norov wrote:
> On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > Hi,
> >
> > I replied ot the original RFC before spotting this; duplicating those comments
> > here because I think they apply regardless of the mechanism used to work around
> > this.
> >
> > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> > > vmap() takes struct page *pages as one of arguments, and user may provide
> > > an invalid pointer which would lead to DABT at address translation later.
> > >
> > > Currently, kernel checks the pages against NULL. In my case, however, the
> > > address was not NULL, and was big enough so that the hardware generated
> > > Address Size Abort on arm64.
> >
> > Can you give an example of when this might happen? It sounds like you're
> > actually hitting this, so a backtrace would be nice.
> >
> > I'm a bit confused as to when why we'd try to vmap() pages that we
> > didn't have a legitimate struct page for -- where did these addresses
> > come from?
> >
> > It sounds like this is going wrong at a higher level, and we're passing
> > entirely bogus struct page pointers around. This seems like the sort of
> > thing DEBUG_VIRTUAL or similar should check when we initially generate
> > the struct page pointer.
>
> Hi Mark,
>
> This is an out-of-tree code that does:
>
> vaddr1 = dma_alloc_coherent()
> page = virt_to_page() // Wrong here
> ...
> vaddr2 = vmap(page)
> memset(vaddr2) // Fault here
>
> virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address.
> The problem is that vmap() populates pte with bad pfn successfully, and it's
> much harder to debug at memory access time.

Ah, ok. FWIW, that case should be caught by DEBUG_VIRTUAL when that's
enabled.

It would be nice to put that within the commit message, since that
clearly indicates this is about catching a mistake in buggy code, and
indicates where to go looking next.

> > > Interestingly, this abort happens even if copy_from_kernel_nofault() is
> > > used, which is quite inconvenient for debugging purposes.
> >
> > I can go take a look at this, but TBH we never expect to take an address size
> > fault to begin with, so this is arguably correct -- it's an internal
> > consistency problem.
> >
> > > This patch adds a pfn_valid() check into vmap() path, so that invalid
> > > mapping will not be created.
> > >
> > > RFC: https://lkml.org/lkml/2022/1/18/815
> > > v1: use pfn_valid() instead of adding an arch-specific
> > > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> > >
> > > Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx>
> > > ---
> > > mm/vmalloc.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index d2a00ad4e1dd..a4134ee56b10 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> > > return -EBUSY;
> > > if (WARN_ON(!page))
> > > return -ENOMEM;
> > > + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > > + return -EINVAL;
> >
> > My fear here is that for this to fire, we've already passed a bogus struct page
> > pointer around the intermediate infrastructure, and any of that might try to
> > use it in unsafe ways (in future even if we don't use it today).
> >
> > I think the fundamental issue here is that we generate a bogus struct page
> > pointer at all, and knowing where that came from would help to fix that.
>
> You're right. That's why WARN_ON() is used for the page == null in the code
> above, I believe, - to let client code know that something goes wrong, and it's
> not a regular ENOMEM situation.

Sure; with all the above in mind I think it's fine for this to be
best-effort (e.g. as Robin noted page_to_pfn() might consume parts of
the struct page before this), since either way that will indicate
roughly where the problem is.

As above, it would just be nice for the commit message to be a little
more explicit as to that.

Thanks,
Mark.