Re: page fault scalability patch: Need help/testing for x86_64, s390, ppc and ppc64

From: Martin Schwidefsky
Date: Wed Sep 01 2004 - 07:10:54 EST


Hi Christoph,

> If you are in posssession of or are a kernel developer for x86_64, s390,
> ppc or ppc64 then please review the relevant arch specific portions and
> after fixing up my stuff please test and see if this actually works on
> your platform.

Well, the patch didn't apply on 2.6.8, 2.6.9-rc1 or BitKeeper. Please
recreate on a defined kernel level.

> The patch needs the following additional atomic operations:
>
> pgtable.h: ptep_xchg (variation on ptep_test_and_clear) and ptep_cmpxchg
> pgalloc.h: pgd_test_and_populate, pmd_test_and_populate.

So far ptep_get_and_clear is called with the page table lock held. That
means that you don't have to protect yourself against concurrent updates
of a pte on different processors. The only thing you need to ensure is
that the update of a pte is atomic in the sense that a page table walk
by the hardware either sees the new or the old value.
Without the page table lock all of a sudden you have to make sure that
the sequence of the get and the clear is atomic, otherwise you have a
race on a SMP system. That means that you have to implement ptep_xchg
using an atomic instruction. You can't just read and store the pte like
you do in your patch. Use xchg. And I fear that there are other subtle
races as well if you remove the page table lock.

> pgalloc.h: pgd_test_and_populate, pmd_test_and_populate.

One of the problems with you patch is that on s/390 you can't do an atomic
update of a page middle directory (pmd). On 64 bit there are two 8 byte
words and on 31 bit there are four 4 bytes words that need to be modified
for a single pmd entry. You need to use the page table lock to update pmds.
Another problem is that an invalid PTE/PMD does not equal 0. For invalid,
empty ptes the value is _PAGE_INVALID_EMPTY and for invalid, empty pmds the
value is _PAGE_TABLE_INV for 31 bit and _PMD_ENTRY_INV for 64 bit.

> One note on S/390: It seems that ptep_test_and_clear does not use an
> atomic operation. Is this correct? I have implemenmted the ptep_xchg
> on S/390 also with non-atomic operations. Maybe for some unknown reason
> (cannot imagine but ??) there is no need for atomic exchange operations
> for ptes?

Last time I looked there wasn't a ptep function called ptep_test_and_clear.
If you are refering to ptep_get_and_clear aka pte_clear, on s/390 the store
of a single word is atomic. I checked that the compiler really generates
store instructions ("ST"/"STG") to write something to a ptep pointer.

blue skies,
Martin.

P.S. You should send patches against the memory management to the linux-mm
list.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/