[PATCH] mm: Allow __GFP_FS for page_cache_read page cache allocation

From: Michal Hocko
Date: Thu Mar 19 2015 - 09:46:07 EST


page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page. This means that mapping_gfp_mask is used as the
base for the gfp_mask. Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
however, not called from the fs layer so it doesn't need this
protection.

The protection might be even harmful. There is a strong push to fail
GFP_NOFS allocations rather than loop within allocator indefinitely with
a very limited reclaim ability. Once we start failing those requests
the OOM killer might be triggered prematurely because the page cache
allocation failure is propagated up the page fault path and end up in
pagefault_out_of_memory.

This patch updates mapping_gfp_mask and adds both __GFP_FS and __GFP_IO
in __do_fault before we the address space fault handler is called and
restores the original flags after the callback. This will allow to use
the go into FS from the direct reclaim if needed and prevent from
pre-mature OOM killer for most filesystems and still allow FS layer
to overwrite the mask from the .fault handler should there be a need.

Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
---
mm/memory.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index c4b08e3ef058..ba528787e25f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2701,13 +2701,24 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address,
{
struct vm_fault vmf;
int ret;
+ struct address_space *mapping = vma->vm_file->f_mapping;
+ gfp_t mapping_gfp;

vmf.virtual_address = (void __user *)(address & PAGE_MASK);
vmf.pgoff = pgoff;
vmf.flags = flags;
vmf.page = NULL;

+ /*
+ * Some filesystems always drop __GFP_FS to prevent from reclaim
+ * recursion back to FS code. This is not the case here because
+ * we are at the top of the call chain. Add GFP_FS flags to prevent
+ * from premature OOM killer.
+ */
+ mapping_gfp = mapping_gfp_mask(mapping);
+ mapping_set_gfp_mask(mapping, mapping_gfp | __GFP_FS | __GFP_IO);
ret = vma->vm_ops->fault(vma, &vmf);
+ mapping_set_gfp_mask(mapping, mapping_gfp);
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
return ret;

--
2.1.4

> However I am a bit concerned about drivers/block/loop.c.
> Might a filesystem read on the loop block device wait for a page_cache_read()
> on the loop-mounted file? In that case you really don't want __GFP_FS set
> when allocating that page.

I guess I see what you mean but I fail to see how would the deadlock
happen from the page fault path.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/