The major obstacles that need to get addressed:
* Concurrent page state changes:
To guard against concurrent page state updates some kind of lock
is needed. If page_make_volatile() has already done the 11 checks it
will issue the state change primitive. If in the meantime one of
the conditions has changed the user that requires that page in
stable state will have to wait in the page_make_stable() function
until the make volatile operation has finished. It is up to the
architecture to define how this is done with the three primitives
page_test_set_state_change, page_clear_state_change and
page_state_change.
There are some alternatives how this can be done, e.g. a global
lock, or lock per segment in the kernel page table, or the per page
bit PG_arch_1 if it is still free.
Signed-off-by: Martin Schwidefsky <schwidefsky@xxxxxxxxxx>
@@ -601,6 +604,21 @@ copy_one_pte(struct mm_struct *dst_mm, s
out_set_pte:
set_pte_at(dst_mm, addr, dst_pte, pte);
+ return;
+
+out_discard_pte:
+ /*
+ * If the page referred by the pte has the PG_discarded bit set,
+ * copy_one_pte is racing with page_discard. The pte may not be
+ * copied or we can end up with a pte pointing to a page not
+ * in the page cache anymore. Do what try_to_unmap_one would do
+ * if the copy_one_pte had taken place before page_discard.
+ */
+ if (page->index != linear_page_index(vma, addr))
+ /* If nonlinear, store the file page offset in the pte. */
+ set_pte_at(dst_mm, addr, dst_pte, pgoff_to_pte(page->index));
+ else
+ pte_clear(dst_mm, addr, dst_pte);
}
@@ -1390,6 +1391,7 @@ int test_clear_page_writeback(struct pag
radix_tree_tag_clear(&mapping->page_tree,
page_index(page),
PAGECACHE_TAG_WRITEBACK);
+ page_make_volatile(page, 1);
if (bdi_cap_account_writeback(bdi)) {
__dec_bdi_stat(bdi, BDI_WRITEBACK);
__bdi_writeout_inc(bdi);