Re: OOM killer, page fault

From: Hugh Dickins
Date: Mon Nov 02 2009 - 11:27:12 EST


On Mon, 2 Nov 2009, Minchan Kim wrote:
> On Mon, 2 Nov 2009 14:02:16 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> >
> > Maybe some code returns VM_FAULT_OOM by mistake and pagefault_oom_killer()
> > is called. digging mm/memory.c is necessary...
> >
> > I wonder why...now is this code
> > ===
> > static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > unsigned long address, pte_t *page_table, pmd_t *pmd,
> > unsigned int flags, pte_t orig_pte)
> > {
> > pgoff_t pgoff;
> >
> > flags |= FAULT_FLAG_NONLINEAR;
> >
> > if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
> > return 0;
> >
> > if (unlikely(!(vma->vm_flags & VM_NONLINEAR))) {
> > /*
> > * Page table corrupted: show pte and kill process.
> > */
> > print_bad_pte(vma, address, orig_pte, NULL);
> > return VM_FAULT_OOM;
> > }
> >
> > pgoff = pte_to_pgoff(orig_pte);
> > return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
> > }
> > ==
> > Then, OOM...is this really OOM ?
>
> It seems that the goal is to kill process by OOM trick as comment said.
>
> I found It results from Hugh's commit 65500d234e74fc4e8f18e1a429bc24e51e75de4a.
> I think it's not a real OOM.
>
> BTW, If it is culpit in this case, print_bad_pte should have remained any log. :)

Yes, the chances are that this is not related to Norbert's problem.
But thank you for reminding me of that not-very-nice hack of mine.

It was kind-of valid at the time that I wrote it (2.6.15), when
VM_FAULT_OOM did kill the faulting process. But since then the fault
path has rightly been changed (in x86 at least, I didn't check the rest)
to let the OOM killer decide who to kill: so now there's a danger that
a pagetable corruption there will instead kill some unrelated process.

Being lazy, I'm inclined simply to change that to VM_FAULT_SIGBUS now:
which doesn't actually guarantee that the process will be killed, but
should be better than just repeatedly re-faulting on the entry. (I
don't much want to SIGKILL current since mm might not be current's.)

That aberrant use of VM_FAULT_OOM has recently been copied into
do_swap_page() (the first instance; the second instance is right -
hmm, well, the second instance is normally right, but I guess it
also covers pagetable corruption cases which we can't distinguish
there; oh well) and should be corrected there too.

Does VM_FAULT_SIGBUS sound good enough to you?

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/