Re: [PATCH v4 2/2] ksm: provide support to use deferrable timers for scanner thread

From: Peter Zijlstra
Date: Mon Sep 08 2014 - 05:41:36 EST


On Mon, Sep 08, 2014 at 01:25:36AM -0700, Hugh Dickins wrote:
> Well, yes, but... how do we know when there is no more work to do?

Yeah, I figured that out _after_ I send that email..

> Thomas has given reason why KSM might simply fail to do its job if we
> rely on the deferrable timer. So I've tried another approach, patch
> below; but I do not expect you to jump for joy at the sight of it!

Indeed :/

> I've tried to minimize the offensive KSM hook in context_switch().
> Why place it there, rather than do something near profile_tick() or
> account_process_tick()? Because KSM is aware of mms not tasks, and
> context_switch() should have the next mm cachelines hot (if not, a
> slight regrouping in mm_struct should do it); whereas I can find
> no reference whatever to mm_struct in kernel/time, so hooking to
> KSM from there would drag in another few cachelines every tick.
>
> (Another approach would be to set up KSM hint faulting, along the
> lines of NUMA hint faulting. Not a path I'm keen to go down.)
>
> I'm not thrilled with this patch, I think it's somewhat defective
> in several ways. But maybe in practice it will prove good enough,
> and if so then I'd rather not waste effort on complicating it.
>
> My own testing is not realistic, nor representative of real KSM users;
> and I have no idea what values of pages_to_scan and sleep_millisecs
> people really use (and those may make quite a difference to how
> well it works).
>
> Chintan, even if the scheduler guys turn out to hate it, please would
> you give the patch below a try, to see how well it works in your
> environment, whether it seems to go better or worse than your own patch.
>
> If it works well enough for you, maybe we can come up with ideas to
> make it more palatable. I do think your issue is an important one
> to fix, one way or another.
>
> Thanks,
> Hugh
>
> [PATCH] ksm: avoid periodic wakeup while mergeable mms are quiet
>
> Description yet to be written!
>
> Reported-by: Chintan Pandya <cpandya@xxxxxxxxxxxxxx>
> Not-Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> ---
>
> include/linux/ksm.h | 14 +++++++++++
> include/linux/sched.h | 1
> kernel/sched/core.c | 9 ++++++-
> mm/ksm.c | 50 ++++++++++++++++++++++++++++------------
> 4 files changed, 58 insertions(+), 16 deletions(-)
>
> --- 3.17-rc4/include/linux/ksm.h 2014-03-30 20:40:15.000000000 -0700
> +++ linux/include/linux/ksm.h 2014-09-07 11:54:41.528003316 -0700

> @@ -87,6 +96,11 @@ static inline void ksm_exit(struct mm_st
> {
> }
>
> +static inline wait_queue_head_t *ksm_switch(struct mm_struct *mm)

s/ksm_switch/__&/

> +{
> + return NULL;
> +}
> +
> static inline int PageKsm(struct page *page)
> {
> return 0;

> --- 3.17-rc4/kernel/sched/core.c 2014-08-16 16:00:54.062189063 -0700
> +++ linux/kernel/sched/core.c 2014-09-07 11:54:41.528003316 -0700

> @@ -2304,6 +2305,7 @@ context_switch(struct rq *rq, struct tas
> struct task_struct *next)
> {
> struct mm_struct *mm, *oldmm;
> + wait_queue_head_t *wake_ksm = NULL;
>
> prepare_task_switch(rq, prev, next);
>
> @@ -2320,8 +2322,10 @@ context_switch(struct rq *rq, struct tas
> next->active_mm = oldmm;
> atomic_inc(&oldmm->mm_count);
> enter_lazy_tlb(oldmm, next);
> - } else
> + } else {
> switch_mm(oldmm, mm, next);
> + wake_ksm = ksm_switch(mm);

Is this the right mm? We've just switched the stack, so we're looking at
next->mm when we switched away from current. That might not exist
anymore.

> + }
>
> if (!prev->mm) {
> prev->active_mm = NULL;
> @@ -2348,6 +2352,9 @@ context_switch(struct rq *rq, struct tas
> * frame will be invalid.
> */
> finish_task_switch(this_rq(), prev);
> +
> + if (wake_ksm)
> + wake_up_interruptible(wake_ksm);
> }

Quite horrible for sure. I really hate seeing KSM cruft all the way down
here. Can't we create a new (timer) infrastructure that does the right
thing? Surely this isn't the only such case.

I know both RCU and some NOHZ_FULL muck already track when the system is
completely idle. This is yet another case of that.

Attachment: pgpFGWj2Gtyym.pgp
Description: PGP signature