Re: [PATCHv2, RFC 08/30] thp, mm: rewrite add_to_page_cache_locked()to support huge pages

From: Dave Hansen
Date: Thu Mar 21 2013 - 13:09:53 EST


On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
>
> For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head
> page for the specified index and HPAGE_CACHE_NR-1 tail pages for
> following indexes.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> ---
> mm/filemap.c | 76 ++++++++++++++++++++++++++++++++++++++++------------------
> 1 file changed, 53 insertions(+), 23 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2d99191..6bac9e2 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> pgoff_t offset, gfp_t gfp_mask)
> {
> int error;
> + int nr = 1;
>
> VM_BUG_ON(!PageLocked(page));
> VM_BUG_ON(PageSwapBacked(page));
> @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> error = mem_cgroup_cache_charge(page, current->mm,
> gfp_mask & GFP_RECLAIM_MASK);
> if (error)
> - goto out;
> + return error;
>
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> - if (error == 0) {
> - page_cache_get(page);
> - page->mapping = mapping;
> - page->index = offset;
> + if (PageTransHuge(page)) {
> + BUILD_BUG_ON(HPAGE_CACHE_NR > RADIX_TREE_PRELOAD_NR);
> + nr = HPAGE_CACHE_NR;
> + }

That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess
it doesn't matter to some degree, but does putting it inside the if()
imply anything?

> + error = radix_tree_preload_count(nr, gfp_mask & ~__GFP_HIGHMEM);
> + if (error) {
> + mem_cgroup_uncharge_cache_page(page);
> + return error;
> + }
>
> - spin_lock_irq(&mapping->tree_lock);
> - error = radix_tree_insert(&mapping->page_tree, offset, page);
> - if (likely(!error)) {
> - mapping->nrpages++;
> - __inc_zone_page_state(page, NR_FILE_PAGES);
> - spin_unlock_irq(&mapping->tree_lock);
> - trace_mm_filemap_add_to_page_cache(page);
> - } else {
> - page->mapping = NULL;
> - /* Leave page->index set: truncation relies upon it */
> - spin_unlock_irq(&mapping->tree_lock);
> - mem_cgroup_uncharge_cache_page(page);
> - page_cache_release(page);

I do really like how this rewrite de-indents this code. :)

> + page_cache_get(page);
> + spin_lock_irq(&mapping->tree_lock);
> + page->mapping = mapping;
> + page->index = offset;
> + error = radix_tree_insert(&mapping->page_tree, offset, page);
> + if (unlikely(error))
> + goto err;
> + if (PageTransHuge(page)) {
> + int i;
> + for (i = 1; i < HPAGE_CACHE_NR; i++) {
> + page_cache_get(page + i);
> + page[i].index = offset + i;

Is it OK to leave page->mapping unset for these?

> + error = radix_tree_insert(&mapping->page_tree,
> + offset + i, page + i);
> + if (error) {
> + page_cache_release(page + i);
> + break;
> + }
> }

Throughout all this new code, I'd really challenge you to try as much as
possible to minimize the code stuck under "if (PageTransHuge(page))".

For instance, could you change the for() loop a bit and have it shared
between both cases, like:

> + for (i = 0; i < nr; i++) {
> + page_cache_get(page + i);
> + page[i].index = offset + i;
> + error = radix_tree_insert(&mapping->page_tree,
> + offset + i, page + i);
> + if (error) {
> + page_cache_release(page + i);
> + break;
> + }
> }

> - radix_tree_preload_end();
> - } else
> - mem_cgroup_uncharge_cache_page(page);
> -out:
> + if (error) {
> + error = ENOSPC; /* no space for a huge page */
> + for (i--; i > 0; i--) {
> + radix_tree_delete(&mapping->page_tree,
> + offset + i);
> + page_cache_release(page + i);
> + }
> + radix_tree_delete(&mapping->page_tree, offset);

I wonder if this would look any nicer if you just did all the
page_cache_get()s for the entire huge page along with the head page, and
then released them all in one place. I think it might shrink the error
handling paths here.

> + goto err;
> + }
> + }
> + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr);
> + mapping->nrpages += nr;
> + spin_unlock_irq(&mapping->tree_lock);
> + trace_mm_filemap_add_to_page_cache(page);

Do we need to change the tracing to make sure it notes that these were
or weren't huge pages?

> + radix_tree_preload_end();
> + return 0;
> +err:
> + page->mapping = NULL;
> + /* Leave page->index set: truncation relies upon it */
> + spin_unlock_irq(&mapping->tree_lock);
> + radix_tree_preload_end();
> + mem_cgroup_uncharge_cache_page(page);
> + page_cache_release(page);
> return error;
> }
> EXPORT_SYMBOL(add_to_page_cache_locked);

Does the cgroup code know how to handle these large pages internally
somehow? It looks like the charge/uncharge is only being done for the
head page.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/