RE: Hugepage regression

From: Chen, Kenneth W
Date: Tue Oct 10 2006 - 15:30:36 EST


Hugh Dickins wrote on Tuesday, October 10, 2006 12:18 PM
> On Tue, 10 Oct 2006, Chen, Kenneth W wrote:
> >
> > With the pending shared page table for hugetlb currently sitting in -mm,
> > we serialize the all hugetlb unmap with a per file i_mmap_lock. This
> > race could well be solved by that pending patch?
> >
> >
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/shared-page-table-for-hugetlb-page-v
4.patch
>
> Hey, nice try, Ken! But I don't think we can let you sneak shared
> pagetables into 2.6.19 that way ;)

It wasn't my intention to sneak in shared page table, though it does
sort of look like so.


> Yes, I'd expect your i_mmap_lock to solve the problem: and since
> you're headed in that direction anyway, it makes most sense to use
> that solution rather than get into defining arrays, or sacrificing
> the lazy flush, or risking page_count races.
>
> So please extract the __unmap_hugepage_range mods from your shared
> pagetable patch, and use that to fix the bug.

OK, I was about to do so too.


> But again, I protest
> the "if (vma->vm_file)" in your unmap_hugepage_range - how would a
> hugepage area ever have NULL vma->vm_file?

It's coming from do_mmap_pgoff(), file->f_op->mmap can fail with error
code (e.g. not enough hugetlb page) and in the error recovery path, it
nulls out vma->vm_file first before calls down to unmap_region(). I
asked that question before: can we reverse that order (call unmap_region
and then nulls out vma->vmfile and fput)?

unmap_and_free_vma:
if (correct_wcount)
atomic_inc(&inode->i_writecount);
vma->vm_file = NULL;
fput(file);

/* Undo any partial mapping done by a device driver. */
unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/