Re: + x86-mm-handle-mm_fault_error-in-kernel-space.patch added to -mm tree

From: KOSAKI Motohiro
Date: Sun Mar 13 2011 - 06:29:31 EST


> On 03/11, Oleg Nesterov wrote:
> >
> > On 03/11, Andrew Vagin wrote:
> > >
> > >
> > The point is, if current was _NOT_ killed we should follow the current
> > pagefault_out_of_memory() logic or remove pagefault_out_of_memory()
> > completely.
>
> Yes, and I still think this is valid. And thus I still think the patch
> should be changed (btw, this problem is not x86 specific).
>
> However,
>
> > >> Why do you think the current task should be killed? In this case we
> > >> do not need oom-killer at all, we could always kill the caller of
> > >> alloc_page/etc.
> > >
> > > You don't understand. alloc_page calls oom-killer himself, then try
> > > allocate memory again. Pls look at __alloc_pages_slowpath().
> > > __alloc_pages_slowpat may fail if order > 3 || gfp_mask & __GFP_NOFAIL
> > > || test_thread_flag(TIF_MEMDIE)
> >
> > Andrew, please, I know this.
>
> Hmm. It turns out I do not ;)
>
> I thought I can find the case when handle_mm_fault() returns VM_FAULT_OOM
> and the caller is not killed, but I can't. I do not really understand
> mem_cgroup_handle_oom/etc, but it seems we always retry indefinitely even
> with mem_cgroup's. mm/hugetlb.c looks fine too...
>
> So, I have to apologize, I am starting to think you are right.
>
> Maybe someone could explain why pagefault_out_of_memory() is still
> needed?

Hi Oleg, Andrew,

Now you are seeing VM dark side. ;-)
Two independent commit were introduced this hard to understand code.

commit 1c0fe6e3bda0464728c23c8d84aa47567e8b716c
Author: Nick Piggin <npiggin@xxxxxxx>
Date: Tue Jan 6 14:38:59 2009 -0800

mm: invoke oom-killer from page fault

commit 6583bb64fc370842b32a87c67750c26f6d559af0
Author: David Rientjes <rientjes@xxxxxxxxxx>
Date: Wed Jul 29 15:02:06 2009 -0700

mm: avoid endless looping for oom killed tasks

Most typical case is, as andew described, handle_mm_fault -> pte_alloc_one
-> alloc_pages_current(GFP_KERNEL, 0). and order 0 GFP_KERNEL allocation
never fail except the task received TIF_MEMDIE. therefore, in this case,
no need additional pageout_out_of_memory() call. Anyway pageout_out_of_memory()
is no-op if the task has already TIF_MEMDIE.

But, we don't have any gurantee pagefault path have no large allocation
nor no GFP_ATOMIC allocation. Therefore I think Oleg's patch pointed out
right thing. The protocol is, vma->vm_ops->fault() can return VM_FAULT_OOM
and if it is, page fault handler should invoke out-of-memory.

But I doubt practical workload can observe the difference.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/