Re: [PATCH 21/32] nohz/cpuset: Flush cputime on threads in nohzcpusets when waiting leader

From: Frederic Weisbecker
Date: Wed Mar 28 2012 - 07:20:38 EST


On Tue, Mar 27, 2012 at 04:23:14PM +0200, Gilad Ben-Yossef wrote:
> On Tue, Mar 27, 2012 at 4:10 PM, Gilad Ben-Yossef <gilad@xxxxxxxxxxxxx> wrote:
> > On Wed, Mar 21, 2012 at 3:58 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> >> When we wait for a zombie task, flush the cputimes on nohz cpusets
> >> in case we are waiting for a group leader that has threads running
> >> in nohz CPUs. This way thread_group_times() doesn't report stale
> >> values.
> >>
> >> <doubts>
> >> If I understood well the code, by the time we call that thread_group_times(),
> >> we may have childs that are still running, so this is necessary.
> >> But I need to check deeper.
> >> </doubts>
> >>
> > ...
> >>
> >> diff --git a/kernel/exit.c b/kernel/exit.c
> >> index 4b4042f..c194662 100644
> >> --- a/kernel/exit.c
> >> +++ b/kernel/exit.c
> >> @@ -52,6 +52,7 @@
> >>  #include <linux/hw_breakpoint.h>
> >>  #include <linux/oom.h>
> >>  #include <linux/writeback.h>
> >> +#include <linux/cpuset.h>
> >>
> >>  #include <asm/uaccess.h>
> >>  #include <asm/unistd.h>
> >> @@ -1712,6 +1713,13 @@ repeat:
> >>           (!wo->wo_pid || hlist_empty(&wo->wo_pid->tasks[wo->wo_type])))
> >>                goto notask;
> >>
> >> +       /*
> >> +        * For cputime in sub-threads before adding them.
> >> +        * Must be called outside tasklist_lock lock because write lock
> >> +        * can be acquired under irqs disabled.
> >> +        */
> >> +       cpuset_nohz_flush_cputimes();
> >> +
> >>        set_current_state(TASK_INTERRUPTIBLE);
> >>        read_lock(&tasklist_lock);
> >>        tsk = current;
> >> --
> >> 1.7.5.4
> >>
> >
> > I believe this patch is not needed because after this point we call
> > do_wait_thread /ptrace_do_wait, which both call wait_consider_task,
> > which calls wait_task_stopped/zombie/continued, which all eventually
> > calls getrusage, which calls k_getrusage where you added a call to
> > cpuset_noz_flush_cputimes() in another patch :-)
> >
>
> OK, I now see that wait_task_zombie actually calls
> thread_group_times() directly, unlike other wait_task_*
> what I wrote above is not needed.
>
> It does result in more then one IPI for each isolated core (something
> like 3 really) for the other cases though:
> one from this patch and the rest from the one in k_getrusage calls.

Yeah I realize we may be calling getrusage() from each of the wait_*()
things if the user request the rusage. That plus the IPI done in this
patch this is too much.

>
> I wonder what would be a better way to do it. In theory we can send
> the IPI only to nohz cpuset cores that actually
> run tasks form the thread group. Finding which is not trivial though...

I also realize that we only call wait_task_zombie() on group leaders
if they don't have any subthread left (see delay_group_leader() test).
But then we call thread_group_times() to get the time of all threads
in the group from wait_task_zombie().

Now I'm confused.

>
> Gilad
>
> > Gilad
> >
> > --
> > Gilad Ben-Yossef
> > Chief Coffee Drinker
> > gilad@xxxxxxxxxxxxx
> > Israel Cell: +972-52-8260388
> > US Cell: +1-973-8260388
> > http://benyossef.com
> >
> > "If you take a class in large-scale robotics, can you end up in a
> > situation where the homework eats your dog?"
> >  -- Jean-Baptiste Queru
>
>
>
> --
> Gilad Ben-Yossef
> Chief Coffee Drinker
> gilad@xxxxxxxxxxxxx
> Israel Cell: +972-52-8260388
> US Cell: +1-973-8260388
> http://benyossef.com
>
> "If you take a class in large-scale robotics, can you end up in a
> situation where the homework eats your dog?"
>  -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/