Re: [PATCH 0/2] mm->def_flags cleanups (Was: Change khugepaged torespect MMF_THP_DISABLE flag)

From: Alex Thorlton
Date: Wed Jan 22 2014 - 13:40:25 EST

On Wed, Jan 22, 2014 at 06:45:53PM +0100, Oleg Nesterov wrote:
> Alex, Andrew, I think this simple series makes sense in any case,
> but _perhaps_ it can also help THP_DISABLE.
> On 01/20, Alex Thorlton wrote:
> >
> > On Mon, Jan 20, 2014 at 09:15:25PM +0100, Oleg Nesterov wrote:
> > >
> > > Although I got lost a bit, and probably misunderstood... but it
> > > seems to me that whatever you do this patch should not touch
> > > khugepaged_scan_mm_slot.
> >
> > Maybe I've gotten myself confused as well :) After looking through the
> > code some more, my understanding is that khugepaged_test_exit is used to
> > make sure that __khugepaged_exit isn't running from underneath at certain
> > times, so to have khugepaged_test_exit return true when __khugepaged_exit
> > is not necessarily running, seems incorrect to me.
> Still can't understand... probably I need to see v3.

I think the appropriate place to check this is actually in
hugepage_vma_check, so you're correct that we don't need to directly
touch khugepaged_scan_mm_slot, we just need to change a different
function underneath it.

We'll table that for now, though. I'll put out a v3 later today, so you
can see what I'm talking about, but I think your idea looks like it will
be better in the end.

> But you know, I have another idea. Not sure you will like it, and probably
> I missed something.
> Can't we simply add VM_NOHUGEPAGE into ->def_flags? See the (untested)
> patch below, on top of this series.
> What do you think?

At a glance, without testing, it looks like a good idea to me. By
using def_flags, we leverage functionality that's already in place to
achieve the same result. We don't need to add any new checks into the
fault path or into khugepaged, since we're just leveraging the
VM_HUGEPAGE/NOHUGEPAGE flag, which we already check for. We also get
the behavior that you suggested (madvise is still respected, even with
the new THP disable prctl set), for free with this method.

I like the idea, but I think that it should probably be a separate
change from the other few cleanups that you proposed along with it, since
they're somewhat unrelated to this particular issue. Do you agree?

Also, one small note on the code:

> diff --git a/kernel/sys.c b/kernel/sys.c
> index ac1842e..eb8b0fc 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2029,6 +2029,19 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> if (arg2 || arg3 || arg4 || arg5)
> return -EINVAL;
> return current->no_new_privs ? 1 : 0;
> + down_write(&me->mm->mmap_sem);
> + if (option == PR_SET_THP_DISABLE) {
> + if (arg2)
> + me->mm->def_flags |= VM_NOHUGEPAGE;
> + else
> + me->mm->def_flags &= ~VM_NOHUGEPAGE;
> + } else {
> + error = !!(me->mm->flags && VM_NOHUGEPAGE);

Should be:

error = !!(me->mm->def_flags && VM_NOHUGEPAGE);


- Alex
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at