Re: [PATCH] - Fix unmap_vma() bug related to mmu_notifiers

From: Robin Holt
Date: Thu Jan 28 2010 - 08:25:17 EST


On Thu, Jan 28, 2010 at 11:03:27AM +0100, Andrea Arcangeli wrote:
> On Wed, Jan 27, 2010 at 09:49:44PM -0600, Robin Holt wrote:
> > > I think that with the SRCU patch, we have enough. Is that true or have
> > > I missed something?
> >
> > I wasn't quite complete in my previous email. Your srcu patch
> > plus Jack's patch to move the tlb_gather_mmu to after the
> > mmu_notifier_invalidate_range_start().
>
> My pmdp_clear_flush_notify with transparent hugepage will give some
> trouble because it's using mmu_notifier_invalidate_range_start to
> provide backwards compatible API to mmu notifier users like GRU that
> may be mapping physical hugepages with 4k secondary tlb mappings
> (which have to be all invalidated not only the first one). So that
> would still require the full series as it's like if the rmap code
> would be using mmu_notifier_invalidate_range_start. But we can
> probably get away by forcing all mmu notifier methods to provide a
> mmu_notifier_invalidate_hugepage.

The GRU is using a hardware TLB of 2MB page size when the
is_vm_hugetlb_page() indicates it is a 2MB vma. From my reading of it,
your callout to mmu_notifier_invalidate_page() will work for gru and I
think it will work for XPMEM as well, but I am not certain of that yet.
I am certain that it can be made to work for XPMEM. I think using the
range callouts are actually worse because now we are mixing the conceptual
uses of page and range.

> But in addition to srcu surely you also need i_mmap_lock_to_sem for
> unmap_mapping_range_vma taking i_mmap_lock, basically you missed
> truncate. Which then in cascade requires free_pgtables,

I must be missing something key here. I thought unmap_mapping_range_vma
would percolate down to calling mmu_notifier_invalidate_page() which
xpmem can sleep in. Based upon that assumption, I don't see the
need for the other patches.

> rwsem-contended, unmap_vmas (the latter are for the tlb gather
> required to be in atomic context to avoid scheduling to other cpus
> while holding the tlb gather).
>
> So you only avoid the need of anon-vma switching to rwsem (because
> there's no range-vmtruncate but only rmap uses it on a page-by-page
> basis with mmu_notifier_invalidate_page). So at that point IMHO you
> can as well add a CONFIG_MMU_NOTIFIER_SLEEPABLE and allow scheduling
> everywhere in mmu notifier IMHO, but if you prefer to avoid changing
> anon_vma lock to rwsem and add refcounting that is ok with me too. I
> just think it'd be cleaner to switch them all to sleepable code if we
> have to provide for it and most of the work on the i_mmap_lock side is
> mandatory anyway.

I don't see the mandatory part here. Maybe it is your broken english
combined with my ignorance, but I do not see what the statement
"i_mmap_lock side is mandatory" is based upon. It looks to me like
everywhere that is calling an mmu_notifier_invalidate_* while holding
the i_mmap_lock is calling mmu_notifier_invalidate_page(). That is
currently safe for sleeping as far as XPMEM is concerned. Is there a
place that is calling mmu_notifier_invalidate_range_*() while holding
the i_mmap_lock which I have missed?

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/