tlb_start_vma() / tlb_end_vma() inefficiency (was Re: [PATCH 1/1][ARM] Always do the full MM flush when unmapping VMA)

From: Aaro Koskinen
Date: Tue Mar 03 2009 - 13:19:39 EST


Hello,

Russell King - ARM Linux wrote:
On Tue, Mar 03, 2009 at 06:23:55PM +0200, Aaro Koskinen wrote:
When unmapping N pages (e.g. shared memory) the amount of TLB flushes
done is (N*PAGE_SIZE/ZAP_BLOCK_SIZE)*N although it should be N at
maximum. With PREEMPT kernel ZAP_BLOCK_SIZE is 8 pages, so there is a
noticeable performance penalty and the system is spending its time in
flush_tlb_range().

The problem is that tlb_end_vma() is passing always the full VMA
range. The subrange that needs to be flushed would be available in
tlb_finish_mmu(), but the VMA is not available anymore. So always do
the full MM flush.

NAK. If we're only unmapping a small VMA, this will result in us knocking
out all TLB entries. That's far from desirable.

The better solution is to probably seek to change tlb_end_vma() so that
it knows how much work to do, which does need a generic kernel change
and therefore to be discussed on lkml.

Ok, fair enough, moving this to lkml.

So, there is a problem in the way tlb_start_vma() and tlb_end_vma() are currently used: unmap_page_range() can be called multiple times when unmapping a VMA, and each time it calls tlb_start_vma()/tlb_end_vma() with the full range, instead of the subrange it's actually unmapping.

On ARM, tlb_flush_range() is called from tlb_end_vma(), and so, every time it goes unnecessarily through the whole VMA range. If I unmap 2048 pages with PREEMPT enabled, that's 256*2048 flushes. You don't even have to measure to see an application freeze when it's unmapping a large area. (On some architectures this problem is not visible at all since these routines can be NOP.)

The question is how to fix this. There is currently no good way to implement these routines for architectures that are doing range-specific TLB flushes. As suggested above by Russell, perhaps it could be reasonable to change tlb_{start,end}_end() API so that it would also pass on the range that is/was actually unmapped by unmap_page_range()?

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/