Re: page fault scalability patch V9: [7/7] atomic pte operatiosn for s390

From: Martin Schwidefsky
Date: Tue Sep 28 2004 - 10:20:28 EST


Hi Christoph,

> 7/7: Atomic pte operations for s390
>
> Add atomic PTE operations for S390. This has also not been tested
> yet since I have no S/390 available. Feedback from the S/390 people
> seems to indicate though that the way it is done is fine.

I finally found some time to experiment with your page scalability
patches. Unluckily it is not possible to implement a spinlock free
update of page table entries if we want to keep our ipte optimization.
The ipte instruction is behind our ptep_clear_flush and ptep_establish
primitives. The instruction invalidates a pte by setting the invalid
bit (0x400) and flushes the tlbs on all cpus referring to this pte.
The catch is that the instruction only works on the original,
unmodified pte and that the store of the byte containing the invalid
bit isn't atomic in regard to other changes to the pte. So we have
to take the page table lock for all changes to ptes because ipte
might race against other updates.

Since we don't have that many processors on s390 the selective
purging of the tlb by the ipte instruction is more important then
the spinlock free page table update. The only operation that could
use an atomic update on s390 is pgd_test_and_populate for 64 bit.
Since this is a rather infrequently used operation it doesn't make
too much sense to optimize it. That leaves ptep_xchg_flush as the
operation that needs to get implemented by use of the ipte. You'll
find a new patch that adds an implementation of ptep_xchg_flush for
s390 below, the old patch is broken. Please replace.

There are some more things that need improvement:
* You should introduce a #define for each of the new primitives
(__HAVE_ARCH_PTEP_CMPXCHG, __HAVE_ARCH_PTEP_XCHG_FLUSH, etc) so
that the architectures can redefine the primitives separatly.
For s390 I need to define a ptep_xchg_flush but want to use the
default implementation for all the other new primitives.
* In the generic implementations of the new primitives please use
the mm from the vma to do the locking. To rely on the fact that
the memory structure in the functions that use the macros are always
called "mm" is quite a hack.
* The ptep_get_clear_flush primitive isn't used anywhere and should
be removed.
* The flush_tlb_page in the generic ptep_cmpxchg implementation
shouldn't be there, the macro is used in code sequences where it
isn't needed (the old code didn't flush).

blue skies
Martin.

---
From: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>

Changelog
* Provide ptep_xchg_flush primitive for s390

Signed-off-by: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>

diff -urN linux-2.6.orig/include/asm-generic/pgtable.h linux-2.6/include/asm-generic/pgtable.h
--- linux-2.6.orig/include/asm-generic/pgtable.h 2004-09-28 16:51:25.000000000 +0200
+++ linux-2.6/include/asm-generic/pgtable.h 2004-09-28 16:54:31.000000000 +0200
@@ -134,6 +134,7 @@
* page_table_handling of the cpu may bypass all locking.
*/

+#ifndef __HAVE_ARCH_PTEP_XCHG_FLUSH
#define ptep_xchg_flush(__vma, __address, __ptep, __pteval) \
({ \
pte_t __pte; \
@@ -144,6 +145,7 @@
spin_unlock(&mm->page_table_lock); \
__pte; \
})
+#endif


#define ptep_get_clear_flush(__vma, __address, __ptep) \
diff -urN linux-2.6.orig/include/asm-s390/pgtable.h linux-2.6/include/asm-s390/pgtable.h
--- linux-2.6.orig/include/asm-s390/pgtable.h 2004-08-14 07:37:38.000000000 +0200
+++ linux-2.6/include/asm-s390/pgtable.h 2004-09-28 16:54:09.000000000 +0200
@@ -567,6 +567,17 @@
return pte;
}

+#define ptep_xchg_flush(__vma, __address, __ptep, __pteval) \
+({ \
+ struct mm_struct *__mm = __vma->vm_mm; \
+ pte_t __pte; \
+ spin_lock(&__mm->page_table_lock); \
+ __pte = ptep_clear_flush(__vma, __address, __ptep); \
+ set_pte(__ptep, __pteval); \
+ spin_unlock(&__mm->page_table_lock); \
+ __pte; \
+})
+
static inline void ptep_set_wrprotect(pte_t *ptep)
{
pte_t old_pte = *ptep;
@@ -796,6 +807,7 @@
#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
#define __HAVE_ARCH_PTEP_GET_AND_CLEAR
#define __HAVE_ARCH_PTEP_CLEAR_FLUSH
+#define __HAVE_ARCH_PTEP_XCHG_FLUSH
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
#define __HAVE_ARCH_PTEP_MKDIRTY
#define __HAVE_ARCH_PTE_SAME
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/