Re: What was the problem with quicklists and x86-64?

From: Siddha, Suresh B
Date: Thu Dec 13 2007 - 17:29:54 EST


On Thu, Dec 13, 2007 at 11:47:29AM -0800, Christoph Lameter wrote:
> On Wed, 12 Dec 2007, Jeremy Fitzhardinge wrote:
>
> > I'm looking at unifying the various pgalloc+pgd_lists mechanisms between
> > 32-bit (PAE and non-PAE) and 64-bit, so I'm trying to understand why
> > these differences exist in the first place.
> >
> > Change da8f153e51290e7438ba7da66234a864e5d3e1c1 reverted the use of
> > quicklists for allocating pagetables, because of concerns about ordering
> > with respect to tlb flushes.
>
> These issues only exist with NUMA because of the freeing of off node pages
> before the TLB flush was done. There was a discussion about this issue and
> my fix of simply not freeing the offnode pages early was ignored. Instead
> the x86_64 implementation (which has no problems that I know of) was

NUMA bug might not be the only problem. I think there are more issues
as Linus noticed.

<snip>
Oh, and I see what's wrong: you not only switched "free_page()" to
"quicklist_free()", you *also* switched "tlb_remove_page()" to
"quicklist_free()".
</snip>

The above comment is in reference to below portion of code:

-#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
+#define __pte_free_tlb(tlb,pte) quicklist_free_page(QUICK_PT, NULL,(pte))

tlb_remove_page() was marking tlb->need_flush. Which is later used
by tlb_flush_mmu(). With quicklist_free_page() we loose all that..

Now in a corner case scenario with a big munmap() which calls unmap_region()
and if it so happens that the region getting unmapped just has page
tables setup but with all PTE's set to NULL, unmap_region() may
potentially free the page table pages
[ tlb_finish_mmu() calls check_pgt_cache() which trims quicklist ]
with out flushing the TLB's.
[ (tlb_finish_mmu() calls the tlb_flush_mmu() but it will not do
much as need_flush is not set ]

Similarly Linus brought pre-emptions issues associated with quicklist usage..

thanks,
suresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/