Re: [PATCH 9/12] ksm: fix oom deadlock

From: Andrea Arcangeli
Date: Tue Aug 25 2009 - 13:51:17 EST


On Tue, Aug 25, 2009 at 06:35:56PM +0100, Hugh Dickins wrote:
> "make munlock fast when mlock is canceled by sigkill". It's just
> idiotic that munlock (in this case, munlocking pages on exit) should
> be trying to fault in pages, and that causes its own problems when

I also pondered if to address the thing by fixing automatic munlock,
but then I think the same way it's asking for troubles to cause page
faults with mm_users == 0 in munlock, it's also asking for troubles to
cause page faults with mm_users == 0 in ksm. So if munlock is wrong
ksm was also wrong, and I tried to fix ksm not to do that, while
leaving munlock fixage for later/others.. ;)

> I have now made a patch with munlock_vma_pages_range() doing a
> follow_page() loop instead of faulting in; but I've not yet tested

That is a separate problem in my view.

> I'd prefer not to have them too, but haven't yet worked out how to
> get along safely without them.

ok.

> But the mmap_sem is not enough to exclude the mm exiting
> (until __ksm_exit does its little down_write,up_write dance):
> break_cow etc. do the ksm_test_exit check on mm_users before
> proceeding any further, but that's just not enough to prevent
> break_ksm's handle_pte_fault racing with exit_mmap - hence the
> ksm_test_exits in mm/memory.c, to stop ptes being instantiated
> after the final zap thinks it's wiped the pagetables.
>
> Let's look at your actual patch...

I tried to work out how to get along safely without them, in short my
patch makes mmap_sem + ksm_test_exit check on mm_users before
proceeding any further "enough" (while still allowing ksm loop to bail
out if mm_users suddenly reaches zero because of oom killer).

Furthermore the mmap_sem is already guaranteed l1 hot and exclusive
because we wrote to it a few nanoseconds before calling mmput (to be
fair locked ops are not cheap but I'd rather add two locked op to the
last exit syscall of a thread group than a new branch to every single
page fault as there are tons more page faults than exit syscalls).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/