Re: Problem with global_flush_tlb() on i386 (x86_64? too) in 2.6.22-rc4-mm2

From: Mathieu Desnoyers
Date: Wed Jun 20 2007 - 14:15:26 EST


* Andi Kleen (ak@xxxxxxx) wrote:
> On Wednesday 20 June 2007 18:46, Mathieu Desnoyers wrote:
> > * Andi Kleen (ak@xxxxxxx) wrote:
> > > On Tuesday 19 June 2007 22:01:36 Mathieu Desnoyers wrote:
> > > > Looking more closely into the code to find the cause of the
> > > > change_page_addr()/global_flush_tlb() inconsistency, I see where the
> > > > problem could be:
> > >
> > > Yes it's a known problem. I have a hack queued for .22 and there
> > > are proposed patches for .23 too.
> > >
> > > ftp://ftp.firstfloor.org/pub/ak/x86_64/late-merge/patches/cpa-flush
> > >
> > > -ANdi
> >
> > Hi Andi,
> >
> > Although I cannot find it at the specified URL, I suspect it is already
> > in Andrew's tree, in 2.6.22-rc4-mm2, under the name
>
> Try again
>
> > "x86_64-mm-cpa-cache-flush.patch"
>
> No, that's a different patch with also at least one known bug.
>
> -Andi

Yeah, I guess disabling clflush and calling wbinvd and a full TLB flush
on every CPU is the safe way to go.

However, digging in your previous patch (in Andrew's tree), I think I
found a potential cause for the problem:

__change_page_attr does a list_add of &kpte_page->lru. If I am not
mistaken, there can be more than one consecutive struct page *page
having their PTE in the same kpte_page. Therefore, it would generate
many list_add of the same kpte_page, which would cause a loop in the
linked list, and therefore a system hang.

Does it make sense ?

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/