Re: [PATCH] - Fix unmap_vma() bug related to mmu_notifiers

From: Andrea Arcangeli
Date: Thu Jan 28 2010 - 05:04:42 EST


On Wed, Jan 27, 2010 at 09:49:44PM -0600, Robin Holt wrote:
> > I think that with the SRCU patch, we have enough. Is that true or have
> > I missed something?
>
> I wasn't quite complete in my previous email. Your srcu patch
> plus Jack's patch to move the tlb_gather_mmu to after the
> mmu_notifier_invalidate_range_start().

My pmdp_clear_flush_notify with transparent hugepage will give some
trouble because it's using mmu_notifier_invalidate_range_start to
provide backwards compatible API to mmu notifier users like GRU that
may be mapping physical hugepages with 4k secondary tlb mappings
(which have to be all invalidated not only the first one). So that
would still require the full series as it's like if the rmap code
would be using mmu_notifier_invalidate_range_start. But we can
probably get away by forcing all mmu notifier methods to provide a
mmu_notifier_invalidate_hugepage.

But in addition to srcu surely you also need i_mmap_lock_to_sem for
unmap_mapping_range_vma taking i_mmap_lock, basically you missed
truncate. Which then in cascade requires free_pgtables,
rwsem-contended, unmap_vmas (the latter are for the tlb gather
required to be in atomic context to avoid scheduling to other cpus
while holding the tlb gather).

So you only avoid the need of anon-vma switching to rwsem (because
there's no range-vmtruncate but only rmap uses it on a page-by-page
basis with mmu_notifier_invalidate_page). So at that point IMHO you
can as well add a CONFIG_MMU_NOTIFIER_SLEEPABLE and allow scheduling
everywhere in mmu notifier IMHO, but if you prefer to avoid changing
anon_vma lock to rwsem and add refcounting that is ok with me too. I
just think it'd be cleaner to switch them all to sleepable code if we
have to provide for it and most of the work on the i_mmap_lock side is
mandatory anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/